Configuring a DataSource in an Enterprise Application Using Payara Server

Connecting to a data source is a very common task in enterprise applications and the main interface for it is the DataSource interface from Java’s JDBC API. In this post we’ll explore how to configure a DataSource implementation using the Payara application server environment. I will be using Eclipse as IDE (the Enterprise edition), AdoptOpenJDK (build for OpenJDK 8) for running the server, and MariaDB for the database. The Payara Server also works with Oracle’s JDK and Azul Zulu. The goal is to see how to acquire connections to a database in the context of a Java EE environment via the DataSource interface, which is the preferred means to do it according to the JDBC specification.

Requirements

First we need to download and install Payara Server. You can download it from this link. Payara has several product options for download, including a microservices-oriented version and Docker images. You can just download the normal Server Full installation which takes roughly 140 MB. The latest version of the server (v5.192) supports running on JDK 11, although it seems to be in preview mode.

To facilitate developing in Eclipse, we need to install a plugin that can recognize Payara as the application server. This can be done by selecting Help -> Eclipse Marketplace, then searching for Payara in the search box. The plugin to install is called “Payara Tools”.

2019-06-03 21_33_06-Eclipse Marketplace

For the purpose of this simple demo app, the database of choice will be MariaDB, an open source database that is very close to MySQL. Even though MariaDB is not listed among the vendors when configuring the database in Payara console, as we will see later it should be possible to just pass in the DataSource implementation and it will work fine.

Links to these prerequisites are provided at the end of this post.

Setting up the project

The sample application will be a simple Web app with a simple HTTP servlet that will just connect to the database and print information about the database using JDBC. To create the project, the simplest way is from Eclipse selecting New -> Dynamic Web Project. Give a name for the project, and click on the New Runtime button, which will open a dialog to define a target server that references the installed Payara server. There you would specify the local installation of the server and which JDK it will use. These steps are shown in the two images below.

2019-06-04 14_59_04-New Dynamic Web Project
2019-06-04 14_56_09-New Server Runtime Environment

You can either click Finish to create the project, or go through the remaining pages if you want to change the context root, as shown below, which will affect the URL used to access the app from the browser (it defaults to the name of the project). Note that we will not be using any web.xml descriptor here.

2019-06-04 15_25_25-New Dynamic Web Project_context_root

Next we create a servlet under the source directory by selecting New -> Servlet, e.g. in a class com.example.SampleServlet. Eclipse will generate the basic methods to handle the HTTP GET and POST requests (namely, doGet() and doPost). A starting point would be something like the following:

@WebServlet(name = "SampleServlet", urlPatterns = "/dbinfo")
public class SampleServlet extends HttpServlet {
   ...
   @Override
   protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
      response.getWriter().append("Served at: ").append(request.getContextPath());
   }
}

We can run the app directly from Eclipse by right-clicking on the project and selecting Run As -> Run on Server. A dialog will open to select the target server, which will be populated with the one previously defined.

2019-06-04 15_28_45-Run On Server

Select Next and make sure that all properties are filled correctly with no error shown in the dialog. Note the populated domain path corresponding to domain1 (the default domain in Payara) for our Web app.

2019-06-04 15_30_33-Run On Server2

Finishing the run dialog will automatically start Payara and open a browser to the root URL of the app. Our initial servlet should respond with a simple text at the http://localhost:8080/myapp/dbinfo endpoint.

2019-06-04 15_33_34-first_run

This is a quick way to run from within Eclipse (you can also stop the server, clean its state, and otherwise manage it in the Servers view). But it can be more convenient to run Payara separately and manage it from its Admin Console.

Managing Payara from the console

To stop the running instance of Payara, run on the command line:

asadmin stop-domain

The asadamin executable exists under payara_install/bin. Payara is derived from GlassFish Server Open Source Edition, so all the administration commands from it can be used on Payara. To run the server again:

asadmin start-domain

Once started you can open the Admin Console on http://localhost:4848, where you can manage apps and resources on the server. We can already see the application that Eclipse has deployed from the Applications section.

2019-06-04 15_47_30-Payara_admin_console

Next we’ll update the application to connect to the data source. The way to access the DataSource object in this context is using the JNDI lookup facility provided by the application server. We will first register the DataSource of the database via the Admin Console, then inject it via its JNDI name.

Setting up the database

First we start the database process, which can be done using the mysqld executable. MariaDB has predefined database called test, so we can connect to it to create some tables. See Connecting to MariaDB for how to do it this. If you choose a different database vendor, the steps to connect will be different so check the documentation of that particular database.

Assuming we have populated the database test with a few tables, the next step is to register a DataSource for it in Payara. This requires the JDBC driver Jar to be added to the server in order for it to access the DataSource implementation. The JDBC driver for MariaDB can downloaded from this link. To add the Jar to Payara, run the command:

asadmin add-library path_to_jar

In the Admin Console at http://localhost:4848, go to Resources -> JDBC -> JDBC Connection Pools in the left navigation panel. A list of already existing connection pools exist, e.g. for H2 which comes by default with Payara. Click the New button to create a new Connection Pool.

The pool can be given any human friendly name, and we specify the name of the DataSource implementation org.mariadb.jdbc.MySQLDataSource. Upon providing the implementation, Payara seemingly uses it to derive a list of properties to configure the DataSource, e.g. the host, port and database, which need to be filled.

2019-06-04 17_29_09-New JDBC Connection Pool (Step 1 of 2)

2019-06-04 17_29_09-New JDBC Connection Pool (Step 2 of 2)

After filling all needed properties and saving the JDBC Connection Pool, make sure that clicking the Ping button successfully connects to it.

2019-06-04 17_50_18-Ping

Now that the JDBC Connection Pool is created, we can register a JNDI binding for the DataSource, by going to Resources -> JDBC -> JDBC Resources, and selecting New. The JDBC Resource must reference the Pool just created and have a unique name.

2019-06-04 18_06_11-New JDBC Resource

Accessing the DataSource

With the DataSource registered, the servlet class can declare it as a dependency using @Resource:

@Resource(name="java/myapp/jdbc_ds")
DataSource dataSource;

@Override
protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
    try(Connection connection = dataSource.getConnection()) {
        DatabaseMetaData databaseMetaData = connection.getMetaData();
        PrintWriter writer = response.getWriter();
        writer.append("DB info: ")
              .append(databaseMetaData.getDatabaseProductName())
              .append("\nCatalog name: ")
              .append(connection.getCatalog());
        try(ResultSet tables = databaseMetaData.getTables(connection.getCatalog(), null, null, new String[] { "TABLE" })) {
            while(tables.next()) {
                writer.append("\n" + tables.getString("TABLE_NAME"));
            }
        }
    } catch(SQLException ex) {
        response.getWriter().append("Error " + ex.getMessage());
        getServletContext().log("Error connecting to DB", ex);
    }
}

Note that the underlying implementation will use a pool of connections, from which every request will borrow a connection from the pool and return it once done. The pool configuration can be adjusted from the JDBC Connection Pool just created.

We can now run the example just like we did using Eclipse. Another way of doing it is by packaging the app in a WAR file by right-clicking on the project and selecting Export -> WAR file. Then deploy it by running:

asadmin deploy --force path_to_war_file

Note the use of --force to force redeploying the app if it already existed. The result of http://localhost:8080/myapp/dbinfo should look like the below.

2019-06-04 18_31_47-output

Download links

OpenJDK 8 – https://adoptopenjdk.net/index.html
Payara Server – https://www.payara.fish/software/downloads
Eclipse – https://www.eclipse.org/downloads/
Payara Tools for Eclipse – https://marketplace.eclipse.org/content/payara-tools
MariaDB – https://mariadb.org/download/
MariaDB JDBC driver – https://downloads.mariadb.com/Connectors/java/

References

https://docs.payara.fish/documentation/payara-server/
https://eclipse-ee4j.github.io/glassfish/docs/latest/quick-start-guide/toc.html
https://blog.payara.fish/an-intro-to-connection-pools-in-payara-server-5

Advertisements

Guide to Behavior-Driven Development in Java

When working on a software project in a team that includes people with different roles, such as in agile environments, there is always a risk of misalignment in the understanding of end user requirements and what the software should do. The developer may not fully understand them because they may not be clearly formulated by the product owner. The product owner may not realize the complexity of the task being assigned for development and the impact it may have on its delivery. The tester may reason about different edge cases or scenarios that would have been easier to account for at an early stage of the development.

To help improve the development approach through better collaboration between business and developers, behavior-driven development (BDD) was established as a relatively recent software development approach, building on the main ideas of test-driven development (TDD), and using a higher level granularity in the test approach: instead of unit tests for classes and methods, the tests are acceptance tests that validate the behavior of the application. These acceptance tests are derived from concrete examples that are formulated by the team members, so that the behavior of the system is better understood. When these example scenarios are formulated during conversations between the different members, the requirements are likely to be expressed more clearly, the input of the developer will likely be incorporated into them, and the tester will contribute with more scenarios to cover in the tests.

Once these example scenarios are produced, they can be expressed in a format that is easy to read by non-developers, yet follows a certain template that makes it executable by a BDD tool such as Cucumber or JBehave. This format, called the Gherkin syntax, can serve multiple purposes at once:

  1. The scenarios act as executable specifications for the behavior of the feature under test.
  2. These specifications can be executed as automated regression tests.
  3. The scenarios act as documentation about the feature that follows the main code in a version control system.

BDD_with_Cucumber

In Cucumber, which supports several programming languages, such scenarios are written in .feature files that can be added in the project along with the test code. Each file contains scenarios for a specific feature, and each scenario consists of steps, where a step starts for example with Given, When or Then. These steps specify what the scenario is, what assumption(s) it uses, and how the feature will behave in terms of the outcome. In order to execute these steps, we also need the test code (also known as glue code) that will perform whatever action the steps should do. Each step in the feature files will be mapped to a Java method that contains its step definition.

Sample project

As a demonstration, let’s assume we have a simple food ordering application where we want to implement features for adding and removing a meal item from the user’s order. For convenience, let’s create a new project using Cucumber’s Maven archetype support, which should set up the project directory with the minimum code so that we can simply add feature files and step definition classes.

mvn archetype:generate -DarchetypeGroupId=io.cucumber                    \
   -DarchetypeArtifactId=cucumber-archetype -DarchetypeVersion=2.3.1.2   \
   -DgroupId=com.example -DartifactId=cucumber-example                   \
   -Dpackage=com.example.cucumber -Dversion=1.0.0-SNAPSHOT               \
   -DinteractiveMode=false

This should generate a project with a POM file that includes dependencies on the Cucumber artifacts in addition to JUnit, which is itself relied upon to run the tests:

<dependency>
    <groupId>io.cucumber</groupId>
    <artifactId>cucumber-java</artifactId>
    <version>4.2.0</version>
    <scope>test</scope>
</dependency>

<dependency>
    <groupId>io.cucumber</groupId>
    <artifactId>cucumber-junit</artifactId>
    <version>4.2.0</version>
    <scope>test</scope>
</dependency>

<dependency>
    <groupId>junit</groupId>
    <artifactId>junit</artifactId>
    <version>4.12</version>
    <scope>test</scope>
</dependency>

Note: It seems the archetype generates dependency snippets referencing an old version of Cucumber, so in the above dependencies I updated them to the latest retrieved from Maven Central.

The entry point is in the file RunCucumberTest.java, which defines an empty class annotated with @RunWith(Cucumber.class) so that JUnit invokes the custom Cucumber runner, which will automatically scan for feature files and corresponding step definitions and execute them:

@RunWith(Cucumber.class)
@CucumberOptions(plugin = {"pretty"})
public class RunCucumberTest {
}

The CucumberOptions annotation specifies the built-in “pretty” formatter plugin for the report containing test results. This annotation can also be used to specify other options.

With the project set up and after importing it into an IDE, we can start adding our features to the food ordering service, which is assumed to already exist in a class FoodOrderingService (let’s imagine the application already existed before adding features to it). The features to be implemented are adding and removing an item from the current order, as shown in the below code (for conciseness, Lombok annotations are used):

@EqualsAndHashCode(of = "name")   // items are identified by name
@AllArgsConstructor
public class Item {
    @NonNull String name;
    @NonNull String category;
}

@Getter
public class Order {
    List<Item> items = new ArrayList<>();
    BigDecimal price = BigDecimal.ZERO;
}

public class FoodOrderService {

    private Order order = new Order();

    public Optional<Order> getOrder() {
        return Optional.ofNullable(order);
    }

    public void addItem(Item item) {
        // TODO
    }

    public void removeItem(Item item) {
        // TODO
    }

}

Before implementing these features, we add corresponding .feature files that contain some scenarios to describe their behaviors. We can treat these as two features: adding an item to an order, and removing an item from an order. Here is a simple feature file for adding an item. For the sake of brevity, the feature file for removing an item is omitted (it can be viewed in the source code linked to at the end of this post).

Feature: Adding an item to order
  I want to be able to add an item to a current order.

  Scenario: Adding an item to an empty order
    Given I have not yet ordered anything
    When I go to the "Burgers" category
    And I select a "Cheeseburger"
    Then I have a new order
    And the order has 1 item in it

  Scenario Outline: Price of a single item order
    Given I have not yet ordered anything
    When I go to the "<category>" category
    And I select <item>
    Then my current order total is <price>

    Examples: 
      | category   | item                 | price |
      | Sandwiches | a "Chicken Sandwich" | $9    |
      | Dessert    | an "Oreo Cheesecake" | $7    |

The file starts with the Feature keyword and a short description of the feature, followed by a more elaborate description that can serve as documentation, and two scenarios for adding an item. The second scenario (called a scenario outline) illustrates how to repeat a certain scenario for different values.

Next we need to add the step definitions for these steps (the lines starting with Given, When, And, Then, etc). We already have a file src/test/java/com/example/cucumber/Stepdefs.java which was generated with the Maven archetype, so we can add our step definitions there:

public class Stepdefs {

    FoodOrderService foodOrderService;
    String category;

    @Given("I have not yet ordered anything")
    public void no_order_yet() {
        foodOrderService = new FoodOrderService();
    }

    @When("I go to the {string} category")
    public void i_go_to_category(String category) {
        this.category = category;
    }

    @When("I select a/an {string}")
    public void i_select_item(String itemName) {
        foodOrderService.addItem(new Item(itemName, category));
    }

    @Then("I have a new order")
    public void i_have_new_order() {
        assertTrue("Order was null", foodOrderService.getOrder().isPresent());
    }

    @Then("the order has {int} item(s) in it")
    public void order_has_n_item_in_it(int itemCount) {
        assertEquals("Wrong number of items in order",
                itemCount, foodOrderService.getOrder().get().getItems().size());
    }

    @Then("my current order total is \\$([\\d\\.]+)")
    public void current_order_total_is(String price) {
        assertEquals("Wrong order price",
                new BigDecimal(price), foodOrderService.getOrder().get().getPrice());
    }

}

Note that the @Then annotated methods are typically where we do assertions against expected values.

Mapping steps to their step definitions

The way Cucumber maps each step to its definition is simple: Before a scenario is run, every step definition class will be instantiated and annotated methods (with @Given, @Then, etc) will be mapped to the steps by the expression in the annotation. The expression can be either a regular expression, or a Cucumber expression. In the above step definitions, some methods use Cucumber expressions, e.g. capturing integer parameters using {int}. To use these expressions, an additional dependency needs to be added to the POM:

<dependency>
    <groupId>io.cucumber</groupId>
    <artifactId>cucumber-expressions</artifactId>
    <version>6.2.0</version>
    <scope>test</scope>
</dependency>

Running the tests using mvn test results in the following expected errors:

Tests run: 3, Failures: 1, Errors: 2, Skipped: 0, Time elapsed: 0.561 sec <<< FAILURE!
Adding an item to an empty order(Adding an item to order)  Time elapsed: 0.032 sec  <<< FAILURE!
java.lang.AssertionError: Order was null
        at org.junit.Assert.fail(Assert.java:88)
        at org.junit.Assert.assertTrue(Assert.java:41)
        at com.example.cucumber.Stepdefs.i_have_new_order(Stepdefs.java:30)
        at ?.I have a new order(com/example/cucumber/adding_an_item.feature:26)

Price of a single item order(Adding an item to order)  Time elapsed: 0 sec  <<< ERROR!
java.util.NoSuchElementException: No value present
        at java.util.Optional.get(Optional.java:135)
        at com.example.cucumber.Stepdefs.current_order_total_is(Stepdefs.java:42)
        at ?.my current order total is $9(com/example/cucumber/adding_an_item.feature:33)

Price of a single item order(Adding an item to order)  Time elapsed: 0 sec  <<< ERROR!
java.util.NoSuchElementException: No value present
        at java.util.Optional.get(Optional.java:135)
        at com.example.cucumber.Stepdefs.current_order_total_is(Stepdefs.java:42)
        at ?.my current order total is $7(com/example/cucumber/adding_an_item.feature:33)

The next step is to implement the features to make the above tests pass. As a starting point, the price information are encapsulated in a BasicItemRepository class, which contains just enough logic code to make the tests successful. Later we can improve it by querying the information from a database, and re-running the tests to make sure that no regression occurred during the improvement. For now, we keep it simple by checking the item name and returning its appropriate price.

public class FoodOrderService {

    private final ItemRepository itemRepository;
    private Order order;

    public FoodOrderService() {
        itemRepository = new BasicItemRepository();
    }

    public Optional<Order> getOrder() {
        return Optional.ofNullable(order);
    }

    public void addItem(Item item) {
        if(order == null) {
            order = new Order();
        }
        order.items.add(item);

        BigDecimal itemPrice = itemRepository.getItemPrice(item);
        order.price = order.price.add(itemPrice);
    }

    public void removeItem(Item item) {
        getOrder().ifPresent(order -> {
            order.items.remove(item);
            order.price = order.price.subtract(itemRepository.getItemPrice(item));
        });
    }
}

interface ItemRepository {
    BigDecimal getItemPrice(Item item);
}

public class BasicItemRepository implements ItemRepository {

    @Override
    public BigDecimal getItemPrice(Item item) {
        if(item.name.equalsIgnoreCase("Chicken Sandwich")) {
            return new BigDecimal(9);
        } else if(item.name.equalsIgnoreCase("Oreo Cheesecake")) {
            return new BigDecimal(7);
        } else if(item.name.equalsIgnoreCase("Cheeseburger")) {
            return new BigDecimal(9);
        }
        throw new IllegalArgumentException("Unknown item " + item.name);
    }
}

Running the scenarios again with mvn clean test result in a build success.

Some improvements to the organization of scenarios and step definitions

Background steps

In the previous feature file, the same Given step was used. If at least one Given is shared by all scenarios in the feature, it can be moved to a Background:

Feature: Adding an item to order
  I want to be able to add an item to a current order.

  Background:
    Given I have not yet ordered anything

  Scenario: Adding an item to an empty order
    When I go to the "Burgers" category
    And I select a "Cheeseburger"
    Then I have a new order
    And the order has 1 item in it

  Scenario Outline: Price of a single item order
    When I go to the "<category>" category
    And I select <item>
    Then my current order total is <price>

    ...
Organizing step definitions and their dependencies

The mapping between steps and the methods containing their definitions does not depend on the class in which the method is defined. As long as Cucumber finds one method with a matching expression, it will run that method. This leaves the decision of where to place step definitions up to the developer. As is the case with the classes of the main code, step definition classes should be organized in a logical way to make their maintenance easier, especially when the number of tests increases.

One of the biggest challenges when writing step definitions is in maintaining the state between dependent steps in a given scenario. As shown in the Stepdefs class, a field category was used to save the parameter passed to the “When I go to the {string} category“. The field was subsequently used in the next step. This is a simple way to maintain state if every feature file has a separate class that encapsulates all of its step definitions.

Sometimes, however, we may want to split step definitions into more than one class for better maintainability. The best way to share state between inter-class step definitions is to use a shared object, and use dependency injection to pass that object to every instance that needs it. The Cucumber project has bindings to several dependency injection frameworks, including Spring and Guice. If the project is already using a DI framework, it’s probably better to use it in the tests. Otherwise, the simplest one to use is PicoContainer.

To carry out this state management between several classes, let’s assume that we want to split the Stepdefs class into two classes: ItemStepdefs and OrderStepdefs. The first class fills the object with state, and the second uses that state in the steps that need it. This may not normally make sense for this feature. For this example, let’s use the Spring solution; the PicoContainer one is straightforward and does not require any configuration or annotations. First we add the required dependencies. We need both the Cucumber binding and Spring dependencies because our sample project did not initially use Spring:

<dependency>
    <groupId>io.cucumber</groupId>
    <artifactId>cucumber-spring</artifactId>
    <version>4.2.0</version>
    <scope>test</scope>
</dependency>
<dependency>
    <groupId>org.springframework</groupId>
    <artifactId>spring-beans</artifactId>
    <version>5.1.3.RELEASE</version>
    <scope>test</scope>
</dependency>
<dependency>
    <groupId>org.springframework</groupId>
    <artifactId>spring-context</artifactId>
    <version>5.1.3.RELEASE</version>
    <scope>test</scope>
</dependency>
<dependency>
    <groupId>org.springframework</groupId>
    <artifactId>spring-test</artifactId>
    <version>5.1.3.RELEASE</version>
    <scope>test</scope>
</dependency>

Note also the dependency on spring-test.

First we create a class that contains the state to be shared between the step definitions, and annotate it with @Component:

@Component
public class ItemOrderInfo {

    String category;
    FoodOrderService foodOrderService;

}

We also need a configuration class for Spring. We assume that the above <codeComponent class is in the same package of this configuration class:

@Configuration
@ComponentScan
public class SpringTestConfig {
}

Next we annotate one of the two step definition classes with @ContextConfiguration from the spring-test, pointing to the test configuration class that was just created. At this point we can use Spring’s dependency injection mechanism to provide a singleton instance of ItemOrderInfo, the class containing state:

@ContextConfiguration(classes = SpringTestConfig.class)
public class ItemStepdefs {

    @Autowired
    ItemOrderInfo itemInfo;

    @Given("I have not yet ordered anything")
    public void no_order_yet() {
        itemInfo.foodOrderService = new FoodOrderService();
    }

    @When("I go to the {string} category")
    public void i_go_to_category(String category) {
        this.itemInfo.category = category;
    }
}

We can use the same object in the other step definition class:

public class OrderStepdefs {

    @Autowired
    ItemOrderInfo itemInfo;

    @When("I select a/an {string}")
    public void i_select_item(String itemName) {
        itemInfo.foodOrderService.addItem(new Item(itemName, itemInfo.category));
    }

    @Then("I have a new order")
    public void i_have_new_order() {
        assertTrue("Order was null", itemInfo.foodOrderService.getOrder().isPresent());
    }

    ...
}
Hooks

There are some annotations that can be used to hook into the lifecycle of the scenario. For example, to prepare something before every scenario, we can add it in a @Before annotated method (this is different than the org.junit.Before annotation provided by JUnit):

@Before
public void prepare(){
    // Set up something before each scenario
}

Normally this is where things like initializing a resource or preparing a test database can be done.

On the other hand, the @After annotation allows executing code after each scenario. There are also @BeforeStep and @AfterStep annotations.

Filtering scenarios using tags

In some cases we want to run only a subset of scenarios. A handy feature called tags allows labeling specific features or scenarios such that we can reference them when running the tests. The feature file we have so far can be enriched with tags as follows:

@addItem
Feature: Adding an item to order
  I want to be able to add an item to a current order.

  @empty
  Scenario: Adding an item to an empty order
    Given I have not yet ordered anything
    When I go to the "Burgers" category
    And I select a "Cheeseburger"
    Then I have a new order
    And the order has 1 item in it

  @price
  Scenario Outline: Price of a single item order
    Given I have not yet ordered anything
    When I go to the "<category>" category
    And I select <item>
    ...

To run only scenarios tagged with @price, we can pass the tag in the cucumber.options system property:

mvn clean test -Dcucumber.options='--tags "@price"'

The hook annotations (@Before and @After) shown earlier can also take tag expressions to restrict their execution.

Conclusion

The above sample project illustrates a simple workflow that follows behavior-driven development practices: deriving scenarios about our features, formulating them in a natural language syntax, and using them to drive the implementation. The source code can be found here.

Further resources

https://dannorth.net/introducing-bdd/
https://docs.cucumber.io/cucumber/
https://github.com/cucumber/cucumber-jvm/

Batch Updates in Java Persistence Technologies

Relational data access often needs to insert or update multiple rows in a database as part of a single operation. In such scenarios, it is a good idea to use the batch update facilities built on top of JDBC, in order to submit multiple SQL commands as a single request to the underlying database. This reduces the number of roundtrips to the database, hence improving the result time of the operation.

JDBC batched updates

The Statement interface and its subinterfaces, PreparedStatement and CallableStatement support executing multiple SQL statements as a batch, by maintaining a collection of these statements that the application can add to using the method Statement.addBatch(sql). When the batch of statements is ready to be executed, the method Statement.executeBatch() can be called to execute them in one unit. To clear the current batch, the application can call the method Statement.clearBatch(). Only statements that return an update count are eligible for batch execution; select statements will throw a BatchUpdateException.

Example

The following code uses a Statement‘s batch to add a student to a course:

try(Connection connection = dataSource.getConnection()) {
    connection.setAutoCommit(false);

    try(Statement statement = connection.createStatement()) {
        statement.addBatch("insert into student values (14, 'John Doe')");
        statement.addBatch("insert into course values (3, 'Biology')");
        statement.addBatch("insert into student_courses values (14, 3)");

        int[] updateCounts = statement.executeBatch();
        connection.commit();

    } catch(BatchUpdateException ex) {
        connection.rollback();
        ... // do something with exception
    }
}

Another example using a PreparedStatement. Given a customer table, we want to import a list of customers. Notice that the method addBatch does not take an SQL string here, instead it adds the specified parameters to the prepared statement’s batch of commands.

connection.setAutoCommit(false);
try(PreparedStatement statement = connection.prepareStatement("insert into customer values (?, ?)")) {
    int n = 0;
    for(Customer customer : customers) {
        statement.setInt(1, ++n);
        statement.setString(2, customer.getName());
        statement.addBatch();
    }
    int[] updateCounts = statement.executeBatch();
    connection.commit();
} catch(BatchUpdateException ex) {
    connection.rollback();
    ... // do something with exception
}
Switching off auto-commit

One important thing to notice in the above code snippets is the call to connection.setAutoCommit(false), which allows the application to control when to commit the transaction. In the previous code, we only commit the transaction when all statements are executed successfully. In case of a BatchUpdateException thrown because of a failed statement, we roll back the transaction so that no effect happens on the database. We could have decided to examine the BatchUpdateException (as we’ll see shortly) to see which statement(s) failed and still decide to commit the statements that were processed successfully.

Disabling auto-commit mode should always be done when executing a batch of updates. Otherwise, the result of the updates depends on the behavior of the JDBC driver: it may or may not commit the successful statements.

Update counts and BatchUpdateException

The method Statement.executeBatch() returns an array of integers where each value is the number of affected rows by the corresponding statement. The order of values matches the order in which statements are added to the batch. Specifically, each element of the array is:

  1. an integer >= 0, reflecting the affected row count by the update statement,
  2. or the constant Statement.SUCCESS_NO_INFO, indicating that the statement was successful but the affected row count is unknown.

In case one of the statements failed, or was not a valid update statement, the method executeBatch() throws a BatchUpdateException. The exception can be examined by calling BatchUpdateException.getUpdateCounts(), which returns an array of integers. There are two possible scenarios:

  1. If the JDBC driver allows continuing the processing of remaining statements upon a failed one, then the result of BatchUpdateException.getUpdateCounts() is an array containing as many integers as there were statements in the batch, where the integers correspond to the affected row count for successful statements, except for the failed ones where the corresponding array element will be the constant Statement.EXECUTE_FAILED.
  2. If the JDBC driver does not continue upon a failed statement, then the result of BatchUpdateException.getUpdateCounts() is an array containing the affected row count for all successful statements until the first failed one.

Batch updates using Spring’s JdbcTemplate

Spring offers a convenient class as part of its support for JDBC. It reduces the amount of boilerplate code required when using plain JDBC such as processing result sets and closing resources. It also makes batch updates easier, as shown in the following example:

List<Customer> customers = ...;

jdbcTemplate.batchUpdate("insert into customer values (?, ?)",
             new BatchPreparedStatementSetter() {

    @Override
    public void setValues(PreparedStatement ps, int i) throws SQLException {
        ps.setLong(1, customers.get(i).getId());
        ps.setString(2, customers.get(i).getName());
    }

    @Override
    public int getBatchSize() {
        return customers.size();
    }
});

Batch updates using Hibernate

Hibernate can also make use of JDBC’s batching facility when generating the statements corresponding to its persistence operations. The main configuration property is hibernate.jdbc.batch_size which specifies the maximum batch size. This setting can be overriden for a specific session using the method Session.setJdbcBatchSize(). Hibernate will use the value specified in the method on the current session, and if not set it uses the value in the global session factory-level setting hibernate.jdbc.batch_size.

The earlier example that stores a list of customers would use the persistence methods in the Session instance:

Transaction transaction = null;
try (Session session = sessionFactory.openSession()) {
    transaction = session.getTransaction();
    transaction.begin();

    for (Customer customer : customers) {
        session.persist(customer);
    }

    transaction.commit();
} catch (RuntimeException ex) {
    if (transaction != null) {
        transaction.rollback();
    }
    throw ex;
}

When the transaction.commit() is invoked, Hibernate will send the SQL statements that insert the customer rows. If batching is enabled as described earlier (either via hibernate.jdbc.batch_size or by calling Session.setJdbcBatchSize(batchSize)), then all the generated statements will be sent as a single request. Otherwise, each statement is sent as a single request.

When employing batched updates in Hibernate for a large number of entity objects, it is a good practice to flush the session and clear its cache periodically as opposed to flushing the session at the end of the transaction. This reduces memory usage by the session cache because it holds entities that are in persistent state:

Transaction transaction = null;
try (Session session = sessionFactory.openSession()) {
    transaction = session.getTransaction();
    transaction.begin();

    int n = 0;
    for (Customer customer : customers) {
        if (++n % batchSize == 0) {
            // Flush and clear the cache every batch
            session.flush();
            session.clear();
        }
        session.persist(customer);
    }

    transaction.commit();
} catch (RuntimeException ex) {
    if (transaction != null) {
        transaction.rollback();
    }
    throw ex;
}

One important thing to know is that batch insert (not update or delete) doesn’t work with entities using identity columns (i.e. whose generation strategy is GenerationType.IDENTITY, because Hibernate needs to generate the identifier when persisting the entity and in this case the value can only be generated by sending the insert statement.

It should be noted that the above applies equally if the application uses an EntityManager instead of directly using a Session.

Batch updates using jOOQ

jOOQ also supports batch updates easily. Here’s an example that follows the earlier examples:

DSLContext create = ...;
BatchBindStep batch = create.batch(create.insertInto(CUSTOMER, ID, NAME)
                                         .values((Integer) null, null));
int n = 0;
for (Customer customer : customers) {
    batch.bind(++n, customer.getName());
}
int[] updateCounts = batch.execute();

Summary

All major Java persistence technologies support batch mode updates to relational databases leveraging the JDBC API. Such mode can improve performance for applications involving heavy workloads by reducing the number of network roundtrips to the database server.

10 Effective Tips on Using Maven

Maven is without a doubt the most popular build automation tool for software projects in the Java ecosystem. It has long replaced Ant thanks to an easier and declarative model for managing projects, providing dependency management and resolution, well-defined build phases such compile and test, and support for plugins that can do anything related to building, configuring and deploying your code. It is estimated to be used by 60% of Java developers in 2018.

Over the years, a number of usage scenarios and commands turned out to be quite useful for me when working on Maven based projects. Here are a few usage tips that help in using Maven more effectively. There are definitely many more, and one can obviously learn something new everyday for a specific use case, but these are the ones I think can be commonly applied. Note that the focus here is on aspects like command line usage, troubleshooting a certain issue, or making repetitive tasks easier. Hence you won’t find practices like using dependencyManagement to centralize dependencies, which are rather basic anyway and more used in initially composing a POM.

Friendly disclaimer: if you’re new to Maven or haven’t had enough experience using it, it’s better to set aside some time to learn about its basics, instead of trying to learn by way of tips and tricks.
1. Fetching a project’s dependency tree

This one is a no-brainer, but it is key to resolving dependency related issues such as using wrong versions. It is described in dependency:tree goal of the maven-dependency-plugin. You can simply run the below in a command line to display a tree of all dependencies used in your current project (optionally use less to scroll through the result, assuming you’re working on a big enough project):

$ mvn dependency tree | less

Note that in IDEs like Eclipse, this hierarchy of dependencies can be visualized in the POM editor. For example, in Eclipse it can be viewed on the “Dependency Hierarchy” tab of the POM editor.

2. Analyze dependencies

It is a good practice to declare in the POM only those dependencies that a project actually uses, and often you want to explicitly declare dependencies your project uses even if they are transitively included. This makes the POM cleaner, just like it’s a good practice to remove unused imports and declare those for types you use in Java code.

To do that, either run the dependency:analyze goal as a standalone command:

$ mvn dependency:analyze

Whenever the plugin finds an unused dependency that is declared in the POM, or a used dependency that is undeclared, a warning is shown in the output. If a build failure needs to be raised because of this, the paramater failOnWarning can be set to true:

$ mvn dependency:analyze -DfailOnWarning=true

Another way is to use the dependency:analyze-only goal, which does the same thing, but should be used within the build lifecycle, i.e. it can be integrated into the project’s POM:

<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-dependency-plugin</artifactId>
    <executions>
        <execution>
            <id>analyze-deps</id>
            <goals>
                <goal>analyze-only</goal>
            </goals>
        </execution>
    </executions>
</plugin>
3. Skipping tests during a local build

When building a project on a development machine, you may want to skip existing unit and integration tests, perhaps because you want to build the code more quickly or because you don’t care about tests for the moment. Maybe you want to run tests only after you feel you have a first draft of your commit ready to be tested. Note that this should never be done on a CI/CD machine that builds and deploys to a production or a staging environment.

There are two options to consider:

  1. skipping the running of tests
    You can do it with mvn package -DskipTests=true. You can shorten the property to just -DskipTests.
  2. skipping the compilation and running of tests (not recommended)
    You can do it with mvn package -Dmaven.test.skip=true. You can shorten the property to just -Dmaven.test.skip.

The latter skips the entire testing related tasks (both compiling and running tests) so it may make the build slightly faster, but -DskipTests is recommended instead because it allows you to detect changes that broke the tests at compile-time. This is often important, as discovering and fixing errors earlier may end up requiring a re-iteration on the changes in the main code, maybe to do some refactoring to make the code more easier to test.

Bonus tip: Consider running tests in parallel, as described in the Surefire plugin documentation. This is a much better long term solution, but the cost is that you should make sure parallel tests are independent and don’t cause concurrency issues because they will share the same JVM process.

4. Debugging unit tests

The aforementioned properties are understood by the maven-surefire-plugin, which is responsible for running unit tests. This plugin is invoked during the test phase of the build lifecycle. Sometimes you don’t want to debug a failing test in your IDE, maybe because you’re like me and don’t always trust that the IDE is running the test with new changes. Sometimes you have a command line window and just want to stick to it. In that case, pass a property to the plugin as follows:

$ mvn clean package -Dmaven.surefire.debug

This will cause the plugin to listen to a remote debugger on port 5005. Now you can configure a remote debugging in your IDE to connect to the listening plugin and execute the tests in debug mode.

Bonus tip: If you ever need to do the same with integration tests, just use the property -Dmaven.failsafe.debug instead. The name comes from the maven-failsafe-plugin which is responsible for running integration tests.

5. Running a specific test

So you debugged a failing test and fixed the failure and now you want to re-run it to make sure it is successful. To tell Surefire to only run that specific test, the test parameter can be passed on the command line:

$ mvn clean package -Dtest=MyTest

According to the documentation of the test goal of the Maven Surefire plugin, the test parameter can be used to further control the specific test methods to execute:

$ mvn clean package -Dtest=MyTest#testMethod
6. Resuming the build from a project

I was hesitating whether or not to include this one because it looked trivial and Maven usually points it to the user upon a build failure, but I decided it’s still worth listing. Whenever an error occurs in a build and you fixed it and want to re-run the build, the option -rf, followed with a colon and the name of the failed module, can be used to resume the build from the failed module, in order to avoid re-building already successfully built modules:

$ mvn clean install -rf :db-impl
7. Effective POM

Instead of navigating multiple POM files at different levels in your multi-module project and/or POM files defined in dependencies themselves in order to figure out what transitive dependencies are resolved or what plugin configuration is applied, a simple command can show the effective POM that consists of the entire configuration snapshot of the current POM, including inherited information from parent POMs such as properties, plugins, dependency information, and profiles.

$ mvn help:effective-pom | less

In Eclipse it can be viewed by clicking on the bottom tab labeled “Effective POM” within the default POM editor.

8. Building specific modules and their dependencies

In the case of multi-module projects with many dependent modules, you may want to specify explicitly which modules to build and ignore the others. For example you just want to build one or two modules you’re working on along with their dependencies, instead of building the whole list of modules. Instead of just doing mvn clean install from the aggregator POM, you can use the -pl command line option. For example, to build only module db-impl, you can execute the command:

$ mvn clean install -pl db-impl -am

The option -am, shorthand for --also-make, tells Maven to build also the projects required by the list in -pl.

9. Configuring JVM memory

Before building a project, Maven will analyze its hierarchy of modules to construct a graph of dependencies that specifies the order of building these individual modules. Sometimes this analysis step can require more memory than the default allocated to the JVM process of Maven, hence causing a Java heap space error. To configure these memory settings, the MAVEN_OPTS environment variable can be set:

$ export MAVEN_OPTS=-Xms256m -Xmx1024m
10. Debugging a Maven plugin

Since Maven has a rich plugin ecosystem and it is easy to develop a custom plugin, it is likely to be in a situation where a developer needs to debug a problem with such plugins. Given the source code of your plugin is imported into your IDE, you can run Maven in debug mode using the mvnDebug executable (e.g. mvnDebug clean install), and Maven will wait for a remote debugger in the IDE to attach on port 8000.

Conclusion

Knowing how a build tool like Maven works is essential in order to make the most of it, but there are some use cases that often repeat themselves where it’s worth remembering some quick solutions. If you have any other tips that are similar to the above, feel free to comment.

New Java HTTP Client

One of the features to be included with the upcoming JDK 11 release is a standardized HTTP client API that aims to replace the legacy HttpUrlConnection class, which has been present in the JDK since the very early years of Java. The problem with this old API is described in the enhancement proposal, mainly that it is now considered old and difficult to use.

The new API supports both HTTP/1.1 and HTTP/2. The newer version of the HTTP protocol is designed to improve the overall performance of sending requests by a client and receiving responses from the server. This is achieved by introducing a number of changes such as stream multiplexing, header compression and push promises. In addition, the new HTTP client also natively supports WebSockets.

A new module named java.net.http that exports a package of the same name is defined in JDK 11, which contains the client interfaces:

module java.net.http {
    exports java.net.http;
}

You can view the API Javadocs here (note that since JDK 11 is not yet released, this API is not 100% final).

The package contains the following types:

  • HttpClient: the main entry point of the API. This is the HTTP client that is used to send requests and receive responses. It supports sending requests both synchronously and asynchronously, by invoking its methods send and sendAsync, respectively. To create an instance, a Builder is provided. Once created, the instance is immutable.
  • HttpRequest: encapsulates an HTTP request, including the target URI, the method (GET, POST, etc), headers and other information. A request is constructed using a builder, is immutable once created, and can be sent multiple times.
  • HttpRequest.BodyPublisher: If a request has a body (e.g. in POST requests), this is the entity responsible for publishing the body content from a given source, e.g. from a string, a file, etc.
  • HttpResponse: encapsulates an HTTP response, including headers and a message body if any. This is what the client receives after sending an HttpRequest.
  • HttpResponse.BodyHandler: a functional interface that accepts some information about the response (status code and headers), and returns a BodySubscriber, which itself handles consuming the response body.
  • HttpResponse.BodySubscriber: subscribes for the response body, and consumes its bytes into some other form (a string, a file, or some other storage type).

BodyPublisher is a subinterface of Flow.Publisher, introduced in Java 9. Similarly, BodySubscriber is a subinterface of Flow.Subscriber. This means that these interfaces are aligned with the reactive streams approach, which is suitable for asynchronously sending requests using HTTP/2.

Implementations for common types of body publishers, handlers and subscribers are pre-defined in factory classes BodyPublishers, BodyHandlers and BodySubscribers. For example, to create a BodyHandler that processes the response body bytes (via an underlying BodySubscriber)  as a string, the method BodyHandlers.ofString() can be used to create such an implementation.  If the response body needs to be saved in a file, the method BodyHandlers.ofFile() can be used.

Code examples

Specifying the HTTP protocol version

To create an HTTP client that prefers HTTP/2 (which is the default, so the version() can be omitted):

HttpClient httpClient = HttpClient.newBuilder()
			   .version(Version.HTTP_2)  // this is the default
			   .build();

When HTTP/2 is specified, the first request to an origin server will try to use it. If the server supports the new protocol version, then the response will be sent using that version. All subsequent requests/responses to that server will use HTTP/2. If the server does not supports HTTP/2, then HTTP/1.1 will be used.

Specifying a proxy

To set a proxy for the request, the builder method proxy is used to provide a ProxySelector. If the proxy host and port are fixed, the proxy selector can be hardcoded in the selector:

HttpClient httpClient = HttpClient.newBuilder()
			   .proxy(ProxySelector.of(new InetSocketAddress(proxyHost, proxyPort)))
			   .build();
Creating a GET request

The request methods have associated builder methods based on their actual names. In the below example, GET() is optional:

HttpRequest request = HttpRequest.newBuilder()
               .uri(URI.create("https://http2.github.io/"))
               .GET()   // this is the default
               .build();
Creating a POST request with a body

To create a request that has a body in it, a BodyPublisher is required in order to convert the source of the body into bytes. One of the pre-defined publishers can be created from the static factory methods in BodyPublishers:

HttpRequest mainRequest = HttpRequest.newBuilder()
               .uri(URI.create("https://http2.github.io/"))
               .POST(BodyPublishers.ofString(json))
               .build();
Sending an HTTP request

There are two ways of sending a request: either synchronously (blocking until the response is received), or asynchronously. To send in blocking mode, we invoke the send() method on the HTTP client, providing the request instance and a BodyHandler. Here is an example that receives a response representing the body as a string:

HttpRequest request = HttpRequest.newBuilder()
               .uri(URI.create("https://http2.github.io/"))
               .build();

HttpResponse<String> response = httpClient.send(request, BodyHandlers.ofString());
logger.info("Response status code: " + response.statusCode());
logger.info("Response headers: " + response.headers());
logger.info("Response body: " + response.body());
Asynchronously sending an HTTP request

Sometimes it is useful to avoid blocking until the response is returned by the server. In this case we can call the method sendAsync(), which returns a CompletableFuture. A CompletableFuture provides a mechanism to chain subsequent actions to be triggered when it is completed. In this context, the returned CompletableFuture is completed when an HttpResponse is received. If you are not familiar with CompletableFuture, this post provides an overview and several examples to illustrate how to use it.

httpClient.sendAsync(request, BodyHandlers.ofString())
          .thenAccept(response -> {

       logger.info("Response status code: " + response.statusCode());
       logger.info("Response headers: " + response.headers());
       logger.info("Response body: " + response.body());
});

In the above example, sendAsync would return a CompletableFuture<HttpResponse>. The thenAccept method adds a Consumer to be triggered when the response is available.

Sending multiple requests using HTTP/1.1

When loading a Web page in a browser using HTTP/1.1, several requests are sent behind the scenes. A request is first sent to retrieve the main HTML of the page, and then several requests are typically needed to retrieve the resources referenced by the HTML, e.g. CSS files, images and so on. To do this, several TCP connections are created to support the parallel requests, due to a limitation in the protocol where only one request/response can occur on a given connection. However, the number of connections is usually limited (most tests on page loads seem to create 6 connections). This means that many requests will wait until previous requests are complete before they can be sent. The following example reproduces this scenario by loading a page that links to hundreds of images (taken from an online demo on HTTP/2).

A request is first sent to retrieve the HTML main resource. Then we parse the result, and for each image in the document a request is submitted in parallel using an executor with a limited number of threads:

ExecutorService executor = Executors.newFixedThreadPool(6);

HttpClient httpClient = HttpClient.newBuilder()
		.version(Version.HTTP_1_1)
		.build();

HttpRequest mainRequest = HttpRequest.newBuilder()
        .uri(URI.create("https://http2.akamai.com/demo/h2_demo_frame.html"))
        .build();

HttpResponse mainResponse = httpClient.send(mainRequest, BodyHandlers.ofString());

List<Future<?>> futures = new ArrayList<>();

// For each image resource in the main HTML, send a request on a separate thread
responseBody.lines()
            .filter(line -> line.trim().startsWith("<img height"))
            .map(line -> line.substring(line.indexOf("src='") + 5, line.indexOf("'/>")))
            .forEach(image -> {

             Future imgFuture = executor.submit(() -> {
                 HttpRequest imgRequest = HttpRequest.newBuilder()
                         .uri(URI.create("https://http2.akamai.com" + image))
                         .build();
                 try {
                     HttpResponse imageResponse = httpClient.send(imgRequest, BodyHandlers.ofString());
                     logger.info("Loaded " + image + ", status code: " + imageResponse.statusCode());
                 } catch (IOException | InterruptedException ex) {
                     logger.error("Error during image request for " + image, ex);
                 }
             });
             futures.add(imgFuture);
         });

// Wait for all submitted image loads to be completed
futures.forEach(f -> {
    try {
        f.get();
    } catch (InterruptedException | ExecutionException ex) {
        logger.error("Error waiting for image load", ex);
    }
});

Below is a snapshot of TCP connections created by the previous HTTP/1.1 example:

 

TCPView_HTTP1_1

Sending multiple requests using HTTP/2

Running the scenario above but using HTTP/2 (by setting version(Version.HTTP_2) on the created client instance, we can see that a similar latency is achieved but with only one TCP connection being used as shown in the below screenshot, hence using fewer resources. This is achieved through multiplexing, a key feature that enables multiple requests to be sent concurrently over the same connection, in the form of multiple streams of frames. Each request / response is decomposed into frames which are sent over a stream. The client is then responsible for assembling the frames into the final response.

TCPView_HTTP2

If we increase the level of parallelism by allowing more threads in the custom executor, the latency is remarkably reduced, obviously since more requests are sent in parallel over the same TCP connection.

Handling push promises in HTTP/2

Some Web servers support push promises, whereby instead of the browser having to request every page asset, the server can guess which resources are likely to be needed by the client and push them to the client. For each resource, the server sends a special request known as a push promise in the form of a frame to the client. The HttpClient has an overloaded sendAsync method that allows us to handle such promises by either accepting them or rejecting them, as shown in the below example:

httpClient.sendAsync(mainRequest, BodyHandlers.ofString(), new PushPromiseHandler() {

    @Override
    public void applyPushPromise(HttpRequest initiatingRequest, HttpRequest pushPromiseRequest, Function<BodyHandler<String>, CompletableFuture<HttpResponse<String>>> acceptor) {
        // invoke the acceptor function to accept the promise
        acceptor.apply(BodyHandlers.ofString())
                .thenAccept(resp -> logger.info("Got pushed response " + resp.uri()));
    }
})

Pushed resources can lead to better performance by avoiding a round-trip for requests explicitly made by the client that are otherwise pushed by the server along with the initial request.

WebSocket example

The HTTP client also supports the WebSocket protocol which is used in real-time Web applications to provide client-server communication with low message overhead. Below is an example of how to use an HttpClient to create a WebSocket that connects to a URI, sends messages for one second and then closes its output. The API also makes use of asynchronous calls that return CompletableFuture:

HttpClient httpClient = HttpClient.newBuilder().executor(executor).build();
Builder webSocketBuilder = httpClient.newWebSocketBuilder();
WebSocket webSocket = webSocketBuilder.buildAsync(URI.create("wss://echo.websocket.org"), new WebSocket.Listener() {
    @Override
    public void onOpen(WebSocket webSocket) {
        logger.info("CONNECTED");
        webSocket.sendText("This is a message", true);
        Listener.super.onOpen(webSocket);
    }

    @Override
    public CompletionStage<?> onText(WebSocket webSocket, CharSequence data, boolean last) {
        logger.info("onText received with data " + data);
        if(!webSocket.isOutputClosed()) {
            webSocket.sendText("This is a message", true);
        }
        return Listener.super.onText(webSocket, data, last);
    }

    @Override
    public CompletionStage<?> onClose(WebSocket webSocket, int statusCode, String reason) {
        logger.info("Closed with status " + statusCode + ", reason: " + reason);
        executor.shutdown();
        return Listener.super.onClose(webSocket, statusCode, reason);
    }
}).join();
logger.info("WebSocket created");

Thread.sleep(1000);
webSocket.sendClose(WebSocket.NORMAL_CLOSURE, "ok").thenRun(() -> logger.info("Sent close"));
Conclusion

The new HTTP client API provides a standard way to perform HTTP network operations with support for modern Web features such as HTTP/2, without the need to add third-party dependencies. Full code of the above examples can be viewed on here. If you enjoyed this post, feel free to share it!

OpenJDK references:
http://openjdk.java.net/groups/net/httpclient/intro.html
http://openjdk.java.net/groups/net/httpclient/recipes.html

Introduction to Java Bytecode

Reading compiled Java bytecode can be tedious even for experienced Java developers. Why do we need to know about such low level stuff in the first place? Here is a simple scenario that happened to me last week: I had made some code changes on my machine long time ago, compiled a Jar and deployed it on a server to test a potential fix for a performance issue. Unfortunately, the code was never checked in to a version control system and for whatever reason, the local changes were deleted without a trace. After a couple of months, I needed those changes in source form again (which took quite an effort to come up with) but could not find them!

Luckily the compiled code still existed on that remote server. So with a sigh of relief I fetched the Jar again and opened it using a decompiler editor. Only one problem, the decompiler GUI is not a flawless tool, and out of the many classes in that Jar, for some reason, only the specific class I was looking to decompile caused a bug in the UI to be exercised whenever I opened it and the decompiler to crash!

Desperate times call for desperate measures… fortunately I was familiar with raw bytecode and I’d rather take some time manually decompiling some pieces of the code rather than work through the changes and testing them again. Since I still remembered at least where to look in the code, reading bytecode helped me pinpoint the exact changes and construct them back in source form. (I made sure to learn from my mistake and preserve them this time!)

The nice thing about bytecode is that you learn its syntax once and it applies on all Java supported platforms, because it is an intermediate representation of the code, and not the actual executable code for the underlying CPU. Moreover, bytecode is simpler than native machine code because the JVM architecture is rather simple, hence simplifying the instruction set. Yet another nice thing is that all instructions in this instruction set are fully documented by Oracle.

Before learning about the bytecode instruction set though, let’s get familiar with a few things about the JVM which are needed as a prerequisite.

JVM data types

Java is statically typed, which affects the design of the bytecode instructions such that an instruction expects itself to operate on values of specific types. For example, there are several add instructions to add two numbers: iadd, ladd, fadd, dadd. They expect operands of type, respectively, int, long, float and double. The majority of bytecode has this characteristic of having different forms of the same functionality but different depending on the operand types.

The data types defined by the JVM are:

  1. Primitive types:
    • Numeric types: byte (8-bit 2’s complement), short (16-bit 2’s complement), int (32-bit 2’s complement), long (64-bit 2’s complement), char (16-bit unsigned Unicode), float (32-bit IEEE 754 single precision FP), double (64-bit IEEE 754 double precision FP)
    • boolean type
    • returnAddress: pointer to instruction
  2. Reference types:
    • Class types
    • Array types
    • Interface types

The boolean type has limited support in bytecode. For example, there are no instructions that directly operate on boolean values. Boolean values are instead converted to int by the compiler and the corresponding int instruction is used.

Java developers should be familiar with all of the above types, except returnAddress which has no equivalent programming language type.

Stack-based architecture

The simplicity of the bytecode instruction set is largely due to Sun having designed a stack-based VM architecture, as opposed to a register-based one. There are various memory components used by a JVM process, but only the JVM stacks need to be examined in detail on to essentially be able to follow bytecode instructions:

PC register: for each thread running in a Java program, a PC register stores the address of the current instruction.

JVM stack: for each thread, a stack is allocated where local variables, method arguments and return values are stored. Here is an illustration showing stacks for 3 threads.

jvm_stacks

Heap: memory shared by all threads, and storing objects (class instances and arrays). Object deallocation is managed by a garbage collector.

heap.png

Method area: for each loaded class, stores the code of methods and a table of symbols (e.g. references to fields or methods) and constants known as the constant pool.

method_area.png

A JVM stack is composed of frames, each pushed onto the stack when a method is invoked and popped from the stack when the method completes (either by returning normally or by throwing an exception). Each frame further consists of:

  1. An array of local variables, indexed from 0 to its length minus 1. The length is computed by the compiler. A local variable can hold a value of any type, except long and double values which occupy two local variables.
  2. An operand stack used to store intermediate values that would act as operands for instructions, or to push arguments to method invocations.

stack_frame_zoom.png

Bytecode explored

With an idea about the internals of a JVM, we can look at some basic bytecode example generated from sample code. Each method in a Java class file has a code segment that consists of a sequence of instructions, each having the following format:

opcode (1 byte)      operand1 (optional)      operand2 (optional)      ...

That is an instruction consists of one-byte opcode and zero or more operands that contain the data to operate.

Within the stack frame of the currently executing method, an instruction can push or pop values onto the operand stack, and it can potentially load or store values in the array local variables. Let’s look at a simple example:

public static void main(String[] args) {
    int a = 1;
    int b = 2;
    int c = a + b;
}

In order to print the resulting bytecode in the compiled class (assuming it is in a file Test.class), we can run the javap tool:

javap -v Test.class

and we get:

public static void main(java.lang.String[]);
descriptor: ([Ljava/lang/String;)V
flags: (0x0009) ACC_PUBLIC, ACC_STATIC
Code:
stack=2, locals=4, args_size=1
0: iconst_1
1: istore_1
2: iconst_2
3: istore_2
4: iload_1
5: iload_2
6: iadd
7: istore_3
8: return
...

We can see the method signature for the main method, a descriptor which indicates that the method takes an array of Strings ([Ljava/lang/String; ) and has a void return type (V ). A set of flags follow which describe the method as public (ACC_PUBLIC) and static (ACC_STATIC).

The most important part is the Code attribute, which contains the instructions for the method along with information such as the maximum depth of the operand stack (2 in this case), and the number of local variables allocated in the frame for this method (4 in this case). All local variables are referenced in the above instructions except the first one (at index 0) which holds the reference to the args argument. The other 3 local variables correspond to variables a, b and c in the source code.

The instructions from address 0 to 8 will do the following:

iconst_1: Push the integer constant 1 onto the operand stack.

iconst_1.png

istore_1: Pop the top operand (an int value) and store it in local variable at index 1, which corresponds to variable a.

istore_1.png

iconst_2: Push the integer constant 2 onto the operand stack.

iconst_2.png

istore_2: Pop the top operand int value and store it in local variable at index 2, which corresponds to variable b.

istore_2.png

iload_1: Load the int value from local variable at index 1 and push it onto the operand stack.

iload_1.png

iload_2: Load int value from local variable at index 1 and push it onto the operand stack.

iload_2.png

iadd: Pop the top two int values from the operand stack, add them and push the result back onto the operand stack.

iadd

istore_3: Pop the top operand int value and store it in local variable at index 3, which corresponds to variable c.

istore_3.png

return: Return from the void method.

Each of the above instructions consists of only an opcode, which dictates exactly the operation to be executed by the JVM.

Method invocations

In the above example, there is only one method, the main method. Let’s assume that we need to a more elaborate computation for the value of variable c, and we decide to place that in a new method called calc:

public static void main(String[] args) {
    int a = 1;
    int b = 2;
    int c = calc(a, b);
}

static int calc(int a, int b) {
    return (int) Math.sqrt(Math.pow(a, 2) + Math.pow(b, 2));
}

Let’s see the resulting bytecode:

public static void main(java.lang.String[]);
  descriptor: ([Ljava/lang/String;)V
  flags: (0x0009) ACC_PUBLIC, ACC_STATIC
  Code:
    stack=2, locals=4, args_size=1
       0: iconst_1
       1: istore_1
       2: iconst_2
       3: istore_2
       4: iload_1
       5: iload_2
       6: invokestatic  #2         // Method calc:(II)I
       9: istore_3
      10: return

static int calc(int, int);
  descriptor: (II)I
  flags: (0x0008) ACC_STATIC
  Code:
    stack=6, locals=2, args_size=2
       0: iload_0
       1: i2d
       2: ldc2_w        #3         // double 2.0d
       5: invokestatic  #5         // Method java/lang/Math.pow:(DD)D
       8: iload_1
       9: i2d
      10: ldc2_w        #3         // double 2.0d
      13: invokestatic  #5         // Method java/lang/Math.pow:(DD)D
      16: dadd
      17: invokestatic  #6         // Method java/lang/Math.sqrt:(D)D
      20: d2i
      21: ireturn

The only difference in the main method code is that instead of having the iadd instruction, we now an invokestatic instruction, which simply invokes the static method calc. The key thing to note is that the operand stack contained the two arguments that are passed to the method calc. In other words, the calling method prepares all arguments of the to-be-called method by pushing them onto the operand stack in the correct order. invokestatic (or a similar invoke* instruction as will be seen later) will subsequently pop these arguments, and a new frame is created for the invoked method where the arguments are placed in its local variable array.

We also notice that the invokestatic instruction occupies 3 bytes by looking at the address which jumped from 6 to 9. This is because unlike all instructions seen so far, invokestatic includes two additional bytes to construct the reference to the method to be invoked (in addition to the opcode). The reference is shown by javap as #2 which is a symbolic reference to the calc method which is resolved from the constant pool described earlier.

The other new information is obviously the code for the calc method itself. It first loads the first integer argument onto the operand stack (iload_0). The next instruction i2d converts it to a double by applying widening conversion. The resulting double replaces the top of the operand stack.

The next instruction pushes a double constant 2.0d  (taken from the constant pool) onto the operand stack. Then the static Math.pow method is invoked with the two operand values prepared so far (the first argument to calc, and the constant 2.0d). When the Math.pow method returns, its result will be stored on the operand stack of its invoker. This can be illustrated below.

math_pow.png

The same procedure is applied to compute Math.pow(b, 2):

math_pow2.png

The next instruction dadd pops the top two intermediate results, adds them and pushes the sum back to the top. Finally, invokestatic invokes Math.sqrt on the resulting sum, and the result is cast from double to int using narrowing conversion (d2i). The resulting int is returned to main method, which stores it back to c (istore_3).

Instance creations

Let’s modify the example and introduce a class Point to encapsulate XY coordinates.

public class Test {
    public static void main(String[] args) {
        Point a = new Point(1, 1);
        Point b = new Point(5, 3);
        int c = a.area(b);
    }
}

class Point {
    int x, y;

    Point(int x, int y) {
        this.x = x;
        this.y = y;
    }

    public int area(Point b) {
        int length = Math.abs(b.y - this.y);
        int width = Math.abs(b.x - this.x);
        return length * width;
    }
}

The compiled bytecode for the main method is shown below:

 public static void main(java.lang.String[]);
   descriptor: ([Ljava/lang/String;)V
   flags: (0x0009) ACC_PUBLIC, ACC_STATIC
   Code:
     stack=4, locals=4, args_size=1
        0: new           #2       // class test/Point
        3: dup
        4: iconst_1
        5: iconst_1
        6: invokespecial #3       // Method test/Point."<init>":(II)V
        9: astore_1
       10: new           #2       // class test/Point
       13: dup
       14: iconst_5
       15: iconst_3
       16: invokespecial #3       // Method test/Point."<init>":(II)V
       19: astore_2
       20: aload_1
       21: aload_2
       22: invokevirtual #4       // Method test/Point.area:(Ltest/Point;)I
       25: istore_3
       26: return

The new instructions encountereted here are new , dup and invokespecial. Similar to the new operator in the programming language, the new instruction creates an object of the type specified in the operand passed to it (which is a symbolic reference to class Point). Memory for the object is allocated on the heap, and a reference to the object is pushed on the operand stack.

The dup instruction duplicates the top operand stack value, which means that now we have two references the Point object on the top of the stack. The next three instructions push onto the operand stack the arguments of the constructor (used to initialize the object), and then invoke a special initialization method called   which corresponds the contructor. The  method is where the fields x and y will get initialized. After the method is finished, the top three operand stack values are consumed, and what remains is the original reference to the created object (which is by now successfully initialized).

init.png

Next astore_1 pops that Point reference and assigns to local variable at index 1 (the a in astore_1 indicates this is a reference value).

init_store.png

The same procedure is repeated for creating and initializing the second Point instance, which is assigned to variable b .

init2.png

init_store2.png

The last step loads the references to the two Point objects from local variables at indexes 1 and 2 (using aload_1 and aload_2 respectively), and invokes the area method using invokevirtual, which handles dispatching the call to the appropriate method based on the actual type of the object. For example, if the variable a contained an instance of type SpecialPoint that extends Point, and the subtype overrides the area method, then the overriden method is invoked. In this case, there is no subclass, and hence only one area method is available.

area.png

Note that even though the area method accepts one argument, there are two Point references on the top of the stack. The first one (pointA  which comes from variable a) is actually the instance on which the method is invoked (otherwise referred to as this in the programming language), and it will be passed in the first local variable of the new frame for the area method. The other operand value (pointB) is the argument to the area method.

The other way around

You don’t need to master the understanding of each instruction and the exact flow of execution to gain an idea about what the program does based on the bytecode at hand. For example, in my case I wanted to check if the code employed a Java stream to read a file, and whether the stream was properly closed. Now given the below bytecode, it is relatively easy to determine that indeed a stream is used and most likely it is being closed as part of a try-with-resources statement.

 public static void main(java.lang.String[]) throws java.lang.Exception;
  descriptor: ([Ljava/lang/String;)V
  flags: (0x0009) ACC_PUBLIC, ACC_STATIC
  Code:
    stack=2, locals=8, args_size=1
       0: ldc           #2                  // class test/Test
       2: ldc           #3                  // String input.txt
       4: invokevirtual #4                  // Method java/lang/Class.getResource:(Ljava/lang/String;)Ljava/net/URL;
       7: invokevirtual #5                  // Method java/net/URL.toURI:()Ljava/net/URI;
      10: invokestatic  #6                  // Method java/nio/file/Paths.get:(Ljava/net/URI;)Ljava/nio/file/Path;
      13: astore_1
      14: new           #7                  // class java/lang/StringBuilder
      17: dup
      18: invokespecial #8                  // Method java/lang/StringBuilder."<init>":()V
      21: astore_2
      22: aload_1
      23: invokestatic  #9                  // Method java/nio/file/Files.lines:(Ljava/nio/file/Path;)Ljava/util/stream/Stream;
      26: astore_3
      27: aconst_null
      28: astore        4
      30: aload_3
      31: aload_2
      32: invokedynamic #10,  0             // InvokeDynamic #0:accept:(Ljava/lang/StringBuilder;)Ljava/util/function/Consumer;
      37: invokeinterface #11,  2           // InterfaceMethod java/util/stream/Stream.forEach:(Ljava/util/function/Consumer;)V
      42: aload_3
      43: ifnull        131
      46: aload         4
      48: ifnull        72
      51: aload_3
      52: invokeinterface #12,  1           // InterfaceMethod java/util/stream/Stream.close:()V
      57: goto          131
      60: astore        5
      62: aload         4
      64: aload         5
      66: invokevirtual #14                 // Method java/lang/Throwable.addSuppressed:(Ljava/lang/Throwable;)V
      69: goto          131
      72: aload_3
      73: invokeinterface #12,  1           // InterfaceMethod java/util/stream/Stream.close:()V
      78: goto          131
      81: astore        5
      83: aload         5
      85: astore        4
      87: aload         5
      89: athrow
      90: astore        6
      92: aload_3
      93: ifnull        128
      96: aload         4
      98: ifnull        122
     101: aload_3
     102: invokeinterface #12,  1           // InterfaceMethod java/util/stream/Stream.close:()V
     107: goto          128
     110: astore        7
     112: aload         4
     114: aload         7
     116: invokevirtual #14                 // Method java/lang/Throwable.addSuppressed:(Ljava/lang/Throwable;)V
     119: goto          128
     122: aload_3
     123: invokeinterface #12,  1           // InterfaceMethod java/util/stream/Stream.close:()V
     128: aload         6
     130: athrow
     131: getstatic     #15                 // Field java/lang/System.out:Ljava/io/PrintStream;
     134: aload_2
     135: invokevirtual #16                 // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
     138: invokevirtual #17                 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
     141: return
    ...

We see occurrences of java/util/stream/Stream where forEach is called, preceded by a call to InvokeDynamic with a reference to a Consumer. And then we see a chunk of bytecode that calls Stream.close along with branches that call Throwable.addSuppressed. This is the basic code that gets generated by the compiler for a try-with-resources statement.

Here’s the original source for completeness:

public static void main(String[] args) throws Exception {
    Path path = Paths.get(Test.class.getResource("input.txt").toURI());
    StringBuilder data = new StringBuilder();
    try(Stream lines = Files.lines(path)) {
        lines.forEach(line -> data.append(line).append("\n"));
    }

    System.out.println(data.toString());
}

Conclusion

Thanks to the simplicity of the bytecode instruction set and the near absence of compiler optimizations when generating its instructions, disassembling class files could be one way to examine changes into your application code without having the source, if that ever becomes a need.

 

Compact Strings in Java 9

One of the performance enhancements introduced in the JVM (Oracle HotSpot to be specific) as part of Java SE 9 is compact strings. It aims to reduce the size of String objects, hence reducing the overall footprint of Java applications. As a result, it can also reduce the time spent on garbage collection.

The feature is based on the observation that most String objects do not need 2 bytes to encode every character, because most applications use only Latin-1 characters. Hence, instead of having:

/** The value is used for character storage. */
private final char value[];

java.lang.String now has:

private final byte[] value;
/**
 * The identifier of the encoding used to encode the bytes in
 * {@code value}. The supported values in this implementation are
 *
 * LATIN1
 * UTF16
 *
 * @implNote This field is trusted by the VM, and is a subject to
 * constant folding if String instance is constant. Overwriting this
 * field after construction will cause problems.
 */
private final byte coder;

In other words, this feature replaces the char array value (where each element uses 2 bytes) with a byte array with an extra byte to determine the encoding (Latin-1 or UTF-16). This means that for most application that use only Latin-1 characters, only half the previous amount of heap is used. This feature is completely invisible to the user, and related API such as StringBuilder automatically make use of it.

To demonstrate this change in terms of the size used by a String object, I’ll be using Java Object Layout, a simple utility that can be used to visualize the structure of an object in the heap. For that matter, we are interested in determining the footprint of the array (stored in the variable value above), and not simply the reference (both a byte array reference and a char array reference use 4 bytes). The following prints this information using a JOL GraphLayout:

public class JOLSample {

    public static void main(String[] args) {
        System.out.println(GraphLayout.parseInstance("abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz").toFootprint());
    }
}

Running the above against Java 8 and then against Java 9 shows the difference:

$java -version
java version "1.8.0_121"
Java(TM) SE Runtime Environment (build 1.8.0_121-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)

$java -cp lib\jol-cli-0.9-full.jar;. test.JOLSample
java.lang.String@4554617cd footprint:
     COUNT       AVG       SUM   DESCRIPTION
         1       432       432   [C
         1        24        24   java.lang.String
         2                 456   (total)

...

$java -version
java version "9"
Java(TM) SE Runtime Environment (build 9+181)
Java HotSpot(TM) 64-Bit Server VM (build 9+181, mixed mode)

$java -cp lib\jol-cli-0.9-full.jar;. test.JOLSample
java.lang.String@73035e27d footprint:
     COUNT       AVG       SUM   DESCRIPTION
         1       224       224   [B
         1        24        24   java.lang.String
         2                 248   (total)

Ignoring the 24-byte size of the internals of java.lang.String (header plus references), we see the size reduced to almost half with string compaction.

If we change the above String to use a UTF-16 character such as \u0780, and then re-run the above, both Java 8 and Java 9 show the same footprint because the compaction no longer occurs.

This feature can be disabled by passing the option -XX:-CompactStrings to the java command.