My random thoughts about software engineering

An aspiring software craftsman journey, By Mahmoud Ben Hassine

How I Reduced My Java App Code By 80% Using Easy Batch

In this post, I will try to show you how Easy Batch can tremendously simplify batch application development by taking care of the boilerplate code you may write yourself. This will make your application more readable, understandable and maintainable.

The use case is a typical production application that loads data from a CSV file into a relational database table. Here is the input file containing products data:

#id,name,description,price,published,lastUpdate
0001,product1,description1,2500,true,2014-01-01
000x,product2,description2,2400,true,2014-01-01
0003,,description3,2300,true,2014-01-01
0004,product4,description4,-2200,true,2014-01-01
0005,product5,description5,2100,true,2024-01-01
0006,product6,description6,2000,true,2014-01-01,Blah!

Let’s assume we have a JPA EntityManager used to persist Product objects to the database. We would like to map each record of this file to an instance of the following Product POJO :

import java.util.Date;

public class Product {
    private long id;
    private String name;
    private String description;
    private double price;
    private boolean published;
    private Date lastUpdate;
    // getters, setters and toString() omitted
}

Before persisting products to the database, data must be validated to ensure that:

  • product id and name are specified
  • product price is not negative
  • product last update date is in the past

Finally, records starting with # should be ignored, mainly the header record (and probably comments and trailer record).

To keep the example simple, I will write products data to the standard output and not to a database. So let’s get started!

The following listing is a possible (horrible) solution that I have seen hundred of times in production systems:

import java.io.File;
import java.io.FileNotFoundException;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.Scanner;

public class WithoutEasyBatch {

    public static long nbFiltered = 0, nbIgnored = 0, nbRejected = 0, nbProcessed = 0;
    public static SimpleDateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd");

    public static void main(String[] args) throws Exception {
        long startTime = System.currentTimeMillis();
        Scanner scanner = new Scanner(new File(args[0]));
        long currentRecordNumber = 0;
        while (scanner.hasNextLine()) {
            currentRecordNumber++;
            String record = scanner.nextLine();
            if (record.startsWith("#")) {
                System.err.println("record N°" + currentRecordNumber + " [" + record + "] filtered");
                nbFiltered++;
                continue;
            }
            String[] tokens = record.split(",");
            if (tokens.length != 6) {
                System.err.println("record N°" + currentRecordNumber
                    + " [" + record + "] ignored : unexpected record size " + tokens.length);
                nbIgnored++;
                continue;
            }
            Product product = new Product();
            String token = tokens[0];//product id
            if (token.isEmpty()) {
                rejectRecord(currentRecordNumber, record, "product Id is mandatory but was not specified");
                continue;
            }
            try {
                product.setId(Long.parseLong(token));
            } catch (NumberFormatException e) {
                rejectRecord(currentRecordNumber, record, "Unable to convert " + token + " to type long for field id");
                continue;
            }
            token = tokens[1];//product name
            if (token.isEmpty()) {
                rejectRecord(currentRecordNumber, record, "product name is mandatory but was not specified");
                continue;
            }
            product.setName(token);
            product.setDescription(tokens[2]); //product description
            token = tokens[3];//product price
            if (token.isEmpty()) {
                rejectRecord(currentRecordNumber, record, "product price is mandatory but was not specified");
                continue;
            }
            try {
                double price = Double.parseDouble(token);
                if (price < 0) {
                    rejectRecord(currentRecordNumber, record, "Product price must not be negative");
                    continue;
                }
                product.setPrice(price);
            } catch (NumberFormatException e) {
                rejectRecord(currentRecordNumber, record, "Unable to convert " + token + " to type double for field price");
                continue;
            }
            product.setPublished(Boolean.parseBoolean(tokens[4]));
            token = tokens[5];//product last update date
            try {
                Date lastUpdate = dateFormat.parse(token);
                if (lastUpdate.after(new Date())) {
                    rejectRecord(currentRecordNumber, record, "Last update date must be in the past");
                    continue;
                }
                product.setLastUpdate(lastUpdate);
            } catch (ParseException e) {
                rejectRecord(currentRecordNumber, record, "Unable to convert " + token + " to a date for field lastUpdate");
                continue;
            }
            System.out.println("product = " + product);// save product to database here
            nbProcessed++;
        }
        System.out.println("Job Report:");
        System.out.println("Job duration:" + (System.currentTimeMillis() - startTime) + "ms");
        System.out.println("total records = " + currentRecordNumber);
        System.out.println("nbFiltered = " + nbFiltered);
        System.out.println("nbIgnored = " + nbIgnored);
        System.out.println("nbRejected = " + nbRejected);
        System.out.println("nbProcessed = " + nbProcessed);
    }

    public static void rejectRecord(long currentRecordNumber, String record, String cause) {
        System.err.println("record N°" + currentRecordNumber + " [" + record + "] rejected : " + cause);
        nbRejected++;
    }

}

This solution actually works perfectly and implements the requirements above. But it’s an obvious maintenance nightmare! It could be worse if the Product POJO contained dozen of fields, which is often the case in production systems.

In this solution, there is only one line which represents the batch business logic. Do you see it? Here it is:

System.out.println("product = " + product);

In production, this line would be persisting the object to the database. All the rest is boilerplate: handling IO, reading, filtering, parsing and validating data, type conversion, mapping records to Product instances, logging and reporting statistics at the end of execution.

The idea behind Easy Batch is to handle all of this error prone boilerplate code for you. With Easy Batch, you focus only on your batch business logic. So let’s see how would be the solution with Easy Batch.

First, I will create a RecordProcessor to implement the business logic:

public class ProductProcessor implements RecordProcessor<Record, Record> {

    public Record processRecord(final Record record) throws Exception {
        System.out.println("product = " + record.getPayload());
        return record;
    }

}

Then, I will declare (and not implement like in the above solution) data validation constraints on Product POJO with the elegant Bean Validation API as follows:

import org.hibernate.validator.constraints.NotEmpty;

import javax.validation.constraints.Min;
import javax.validation.constraints.NotNull;
import javax.validation.constraints.Past;
import java.text.SimpleDateFormat;
import java.util.Date;

public class Product {

    @NotNull
    private long id;

    @NotEmpty
    private String name;

    private String description;

    @Min(0)
    private double price;

    private boolean published;

    @Past
    private Date lastUpdate;

    // getters, setters and toString() omitted

}

Finally, I need to configure a job to:

  • Read data from the flat file products.csv
  • Filter records starting with #
  • Map each CSV record to an instance of the Product POJO
  • Validate product data
  • Process each record using the ProductProcessor implementation

This can be done with the following snippet:

import org.easybatch.core.filter.StartWithStringRecordFilter;
import org.easybatch.core.job.Job;
import org.easybatch.core.job.JobBuilder;
import org.easybatch.flatfile.DelimitedRecordMapper;
import org.easybatch.flatfile.FlatFileRecordReader;
import org.easybatch.validation.BeanValidationRecordValidator;

public class WithEasyBatch {

    public static void main(String[] args) throws Exception {
        Job job = new JobBuilder()
                .reader(new FlatFileRecordReader(args[0]))
                .filter(new StartWithStringRecordFilter("#"))
                .mapper(new DelimitedRecordMapper(Product.class, "id","name", "description", "price","published", "lastUpdate"))
                .validator(new BeanValidationRecordValidator<Product>())
                .processor(new ProductProcessor())
                .build();

        JobExecutor jobExecutor = new JobExecutor();
        JobReport report = jobExecutor.execute(job);
        jobExecutor.shutdown();

        System.out.println("job report = " + report);
    }

}

That’s all. Except from implementing the core business logic, all I have done is providing configuration metadata that Easy Batch cannot guess. The framework will take care of all the boilerplate code of reading, filtering, parsing, validating and mapping data to domain objects.

Time to do some math by counting total lines of code. Both solutions use the Product POJO, so I’ll ignore it. Imports are also irrelevant, they will also be ignored.

  • The first solution WithoutEasyBatch has 84 lines of code (empty lines have been ignored). Note that I have inlined all variables and tried to make it as compact as possible.
  • The second solution WithEasyBatch has:
    • 6 lines for the ProductProcessor class
    • 4 lines for Bean Validation API annotations added on the Product POJO
    • 11 lines for the main class

In sum, 84LOC vs 21LOC, which is 75% less than the first solution.

Oh wait, this is not 80% as claimed in the title of the post! Ok you got me.. But if I count monitoring, transaction processing and batching which I get for free from Easy Batch, I could actually put 90% or even 95% in the post title!

I hope you got the point and agree with me, the second solution is easier to read, understand, test and maintain.

Summary

In the end of this post, this is what Easy Batch is all about, making your life easier when you have to deal with batch applications in Java. The main motivation behind the framework is to let you keep focus on your business logic and to take care of the boilerplate code for you.

Comments and feedback are welcome!