My random thoughts about software engineering

An aspiring software craftsman journey, By Mahmoud Ben Hassine

Spring Batch Vs Easy Batch: A Performance Comparison

In the previous post, I tried to compare Easy Batch and Spring Batch in terms of features, ease of use and developer productivity. I came up to the conclusion that with no doubt, Spring Batch provides a richer features set and allows you to do much more than Easy Batch does.

In this post, I will compare Spring Batch and Easy Batch in terms of performance. The goal of the benchmark is to process customer data in the following CSV flat file:

id,firstName,lastName,birthDate,email,phone,street,zipCode,city,country
41837,Due,Pearson,2015-08-31,Liza.Diaz@yopmail.org,0102030405,Fifth Avenue,12345,NewYork,China
454205,Liza,Dickson,2015-06-23,Duke.Pearson@hotmail.com,0102030405,Oxford Street,12345,Paris,Germany
852684,Brad,Hinton,2015-08-31,Tommy.Dickson@hotmail.edu,0504030201,Fifth Avenue,54321,Rome,Italy

I will measure the performance of reading, parsing and mapping data to the following domain object:

public class Customer {

    private int id;
    private String firstName;
    private String lastName;
    private Date birthDate;
    private String email;
    private String phone;
    private String street;
    private String zipCode;
    private String city;
    private String country;

    // Getters and setters omitted

}

The processing logic depends heavily on the use case, so it will be omitted.

I will the Random Beans library to generate several files of different sizes for the benchmark: 10.000, 100.000, 1.000.000 and 10.000.000 customers.

The configuration of Spring Batch and Easy Batch applications is pretty much like the Hello World application of the previous post, only the domain object has been changed from Tweet to Customer. Here is the main class to launch Easy Batch:

public class EasyBatchBenchLauncher {

    public static void main(String[] args) throws FileNotFoundException, URISyntaxException {

        File customersFile = new File("customers.csv");

        Job job = new JobBuilder()
                .reader(new FlatFileRecordReader(customersFile))
                .mapper(new DelimitedRecordMapper(Customer.class, "id", "firstName", "lastName", "birthDate", "email", "phone", "street", "zipCode", "city", "country"))
                .build();

        JobExecutor.execute(job);

    }

}

And here is the main class to launch Spring Batch:

public class SpringBatchBenchLauncher {

    public static void main(String[] args) throws Exception {

        ApplicationContext context = new ClassPathXmlApplicationContext("customer-job.xml");

        JobLauncher jobLauncher = (JobLauncher) context.getBean("jobLauncher");

        Job job = (Job) context.getBean("customerJob");

        jobLauncher.run(job, new JobParameters());

    }

}

The complete source code of this benchmark is available on GitHub here.

Results

The benchmark results have been obtained as an average of 5 executions and have been executed on the following Hardware/Software configuration:

Hardware:
  • Laptop: MacBook Pro (Retina, 15-inch, Late 2013)
  • CPU: 2 GHz Intel Core i7
  • RAM: 8 GB 1600 MHz DDR3
  • DISK: 251 GB SSD Flash Storage
Software:
  • OS: Mac OS X Yosemite 10.10.3
  • Java: version 1.7.0_67 HotSpot(TM) 64-Bit Server VM

The commit-interval is an important parameter for the performance of Spring Batch. I have used different values for this parameter: 10, 100 and 1000. The following table summarizes the number of input records, the file size and the processing time for each framework:

Number of records (file size) Easy Batch (s) Spring Batch CI = 10 (s) Spring Batch CI = 100 (s) Spring Batch CI = 1000 (s)
10.000 (964 Ko) <1 3 3 2
100.000 (9.4 Mo) 1 10 6 5
1.000.000 (94 Mo) 5 64 37 35
10.000.000 (983 Mo) 52 417 343 337

It’s always better to see results in a chart, so here it is:



The difference is more important for very large data sets:


Please note that this is a macro benchmark, not a micro benchmark at nano seconds level, the goal is to have a rough idea about the whole execution time for both applications.

Conclusion

Easy Batch is about 6x faster than Spring Batch to read, parse and map flat data to domain objects. There are may be other configurations and use cases where Spring Batch performs better than Easy Batch, if you have done other comparisons, please feel free to share your results.

Another benchmark is to compare a parallel version of the application, but this will be done in another post, so keep tuned :wink: