My random thoughts about software engineering

An aspiring software craftsman journey, By Mahmoud Ben Hassine

Spring Batch Vs Easy Batch: A Hello World Comparison

Disclaimer: I am the author of Easy Batch and for those who don’t know me, I tend to be a constructive person.

In this post I will compare objectively Easy Batch and Spring Batch frameworks in terms of features, ease of use and developer productivity. The goal is not to say that Easy Batch is better than Spring Batch or vice versa, the goal is to say in which situation it is better to use one framework over the other.

First of all, let me start out by saying that I am a VERY big fan of Spring framework and all related projects (Spring data, Spring batch, etc). Spring Batch is an awesome framework, really! Entire books have been written on Spring Batch due to the very rich feature set it offers. Nevertheless, In my honest opinion, Spring Batch (and the JSR 352) has some shortcomings. Before giving my own point of view, I’ve tried to collect some feedback from Spring Batch users on the net:

"I got a little overwhelmed by the complexity and amount of configuration needed for even a simple example"

"What should we think of the Spring Batch solution? Complex. Obviously, it looks more complicated than the simple approaches. This is typical of a framework: the learning curve is steeper."

"For individuals with no prior Spring Batch knowledge, implementing Spring Batch and creating a functioning job may take as little as a couple workdays. After the initial setup, building fairly complex batch jobs can take around another day"

"Spring Batch application grows pretty quick and involves configuring a lot of stuff that, at the outset, it just doesn't seem like you should need to configure. A "job repository" to track the status and history of job executions, which itself requires a data source - just to get started? Wow, that's a bit heavy handed"

"I recently evaluated Spring Batch, and quickly rejected it once I realized that it added nothing to my project aside from bloat and overhead"

"Verbose configuration. I differ with the choice of the defaults. I would have gone with a non-persistent job repository and a resourceless transaction manager. But then they chose the defaults that showcase their salient features"

"il faut configurer le composant qui permet de lancer un batch, le « jobLauncher ». Simple, mais on voit que l’on a besoin d’un « jobRepository » qui permet de suivre et de reprendre l’avancement des tâches. On voit que l’on a besoin d’un transaction manager. Cette propriété est obligatoire, ce qui est à mon sens dommage pour les cas simples comme le nôtre où nous n’utilisons pas les transactions."

"Spring Batch or How Not to Design an API.. Why do I Need a Transaction Manager? Why do I Need a Job Repository?"

Most of these posts are quite recent, there are a couple of them that seem to be outdated, but this is still true for the last version of Spring Batch (v3.0.3 as of writing this post).

These reactions from the community can be summarized in 3 points:

  • Steep learning curve
  • Complex configuration
  • Mandatory components that you have to configure but probably don’t need

Personally, steep learning curve is not a problem if it worth it (and it does for Spring Batch!). Complex configuration is also a point that I can accept. But, the most annoying thing, in my opinion, is that I am forced to configure components that I don’t need:

  • If my application does not require transactions, why do I need to configure a transaction manager?
  • If my application does not need retry on failure or job history, why do I need to configure a Job Repository (even in memory)?
  • If my application does not write anything, why do I need to specify a writer?
  • If my application does not need chunk processing, why do I need to specify a commit-interval?

The Interface Segregation Principle says that a client should not be forced to depend on methods it does not use. With Spring Batch I am being forced to use features that I don’t need. So I do believe it is a kind of violation of this principle in a more general way.

That said, Spring Batch is well suited for use cases where you really need advanced features like retry on failure, remoting , flows, etc. If your batch application does not require such advanced features, using Spring Batch is like using a hummer to kill a fly :wink: In such situations, usually in-house solutions are created from scratch. And this is where Easy Batch comes to play, as a middle lightweight solution between Spring Batch and the “Do It Yourself” way:



In this post, I will implement a hello world batch application to process a CSV file containing tweets. I will then try to evaluate the effort needed to configure the application using both frameworks.

The data source is a CSV file containing the following tweets:

id,user,message
1,foo,Spring Batch rocks! #SpringBatch
2,bar,Easy Batch rocks too! and it's easier :wink: #EasyBatch

The goal is to print out tweets in upper case to the console. Records should be mapped to the following Tweet domain object:

public class Tweet {

    private int id;
    private String user;
    private String message;
    // Getters and setters omitted

}

Spring Batch implementation:

First, let’s create a tweet processor to implement the application business logic:

public class TweetProcessor implements ItemProcessor<Tweet, Tweet> {

    @Override
    public Tweet process(Tweet tweet) throws Exception {
        tweet.setMessage(tweet.getMessage().toUpperCase());
        return tweet;
    }

}

Then, create a writer:

public class TweetWriter implements ItemWriter<Tweet> {

    @Override
    public void write(List<? extends Tweet> items) throws Exception {
        for (Tweet tweet : items) {
            System.out.println(tweet);
        }
    }

}

And finally, configure the application:

<beans xmlns="http://www.springframework.org/schema/beans"
       xmlns:batch="http://www.springframework.org/schema/batch"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xsi:schemaLocation="http://www.springframework.org/schema/batch
        http://www.springframework.org/schema/batch/spring-batch-2.2.xsd
        http://www.springframework.org/schema/beans
        http://www.springframework.org/schema/beans/spring-beans-3.2.xsd ">

    <bean id="transactionManager"
        class="org.springframework.batch.support.transaction.ResourcelessTransactionManager"/>

   <bean id="jobRepository" class="org.springframework.batch.core.repository.support.MapJobRepositoryFactoryBean">
        <property name="transactionManager" ref="transactionManager"/>
    </bean>

    <bean id="jobLauncher" class="org.springframework.batch.core.launch.support.SimpleJobLauncher">
        <property name="jobRepository" ref="jobRepository"/>
    </bean>

    <bean id="tweet" class="common.Tweet" scope="prototype"/>

    <bean id="tweetReader" class="org.springframework.batch.item.file.FlatFileItemReader">
        <property name="resource" value="classpath:tweets.csv"/>
        <property name="linesToSkip" value="1"/>
        <property name="lineMapper">
            <bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
                <property name="lineTokenizer">
                    <bean class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer">
                        <property name="names" value="id,user,message"/>
                    </bean>
                </property>
                <property name="fieldSetMapper">
                    <bean class="org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper">
                        <property name="prototypeBeanName" value="tweet"/>
                    </bean>
                </property>
            </bean>
        </property>
    </bean>

    <bean id="tweetProcessor" class="springbatch.TweetProcessor"/>

    <bean id="tweetWriter" class="springbatch.TweetWriter"/>

    <batch:job id="helloWorldJob">
        <batch:step id="step1">
            <batch:tasklet>
                <batch:chunk reader="tweetReader" writer="tweetWriter" processor="tweetProcessor"
                 commit-interval="10"/>
            </batch:tasklet>
        </batch:step>
    </batch:job>

</beans>

Here is the class to launch the application with Spring Batch:

public class SpringBatchHelloWorldLauncher {

    public static void main(String[] args) throws Exception {

        ApplicationContext context = new ClassPathXmlApplicationContext("job-hello-world.xml");
        JobLauncher jobLauncher = (JobLauncher) context.getBean("jobLauncher");
        Job job = (Job) context.getBean("helloWorldJob");
        jobLauncher.run(job, new JobParameters());

    }

}

Please note that in this case, we have written more lines of configuration code than the application’s business logic itself.

Easy Batch implementation:

With Easy Batch, usually you just have to implement the application business logic and let the framework take care of the boilerplate code of reading, parsing and mapping data to domain objects. So let’s create a tweet processor to implement the batch business logic:

public class TweetProcessor implements RecordProcessor<Record<Tweet>, Record<Tweet>> {

    @Override
    public Record<Tweet> processRecord(Record<Tweet> record) {
        Tweet tweet = record.getPayload();
        tweet.setMessage(tweet.getMessage().toUpperCase());
        return new GenericRecord<>(record.getHeader(), tweet);
    }

}

Then, let’s create a main method to launch the application with Easy Batch:

public class EasyBatchHelloWorldLauncher {

    public static void main(String[] args) throws Exception {

        JobBuilder.aNewJob()
                .reader(new FlatFileRecordReader(new File("tweets.csv")))
                .filter(new HeaderRecordFilter())
                .mapper(new DelimitedRecordMapper(Tweet.class, "id", "user", "message"))
                .processor(new TweetProcessor())
                .writer(new StandardOutputRecordWriter())
                .call();

    }

}

Comparison:

The Tweet object is common to both scenarios; it will be ignored. The following table summarizes the runtime dependencies size, the number of line of code (imports and empty lines are ignored) and the number of XML configuration lines of both solutions:

Spring Batch Easy Batch
Runtime dependencies 14 jars (4.43 Mo) 2 jars (130 Kb)
Lines of code 23 19
XML 37 0

As you can see, even for a simple application, Spring Batch still require to configure a lot of technical stuff that you don’t really need, which is not the case for Easy Batch. Note that even if Spring Batch has been configured with Java instead of XML, you still need roughly the same amount code to achieve the same XML configuration above.

But ok, Easy Batch is more lightweight, may be easier to learn, configure and use, this does not make it suitable for all use cases. Let’s see a side by side comparison of features for both frameworks:

Feature Spring Batch Easy Batch
Learning curve   Steep   Small
POJO based development   Yes   Yes
Parallel processing   Yes   Yes
Asynchronous processing   Yes   Yes
Job scheduling   Yes   Yes
Real time monitoring   Yes   Yes
Job configuration   Java, Xml, Annotations   Java
Transaction management   Declarative, Programmatic   Declarative, Programmatic
Chunk processing   Yes   Yes
Retry on failure   Yes   Yes (using a listener)
Remote job administration   Yes   No
Data partitioning   Yes   No
Implements the JSR 352   Yes   No

There is no doubt, Spring Batch outcomes Easy Batch in term of features, but this comes with a cost, a complex configuration and a steep learning curve. The goal of Easy Batch is to keep the framework small and easy but at the same time extensible and flexible with smart defaults.

Conclusion:

In this post, I tried to show that Easy Batch can be useful in situations where Spring Batch is a heavy solution for the requirement, and developing a new solution from scratch is not an option neither. So let me conclude by saying that we should be pragmatic: choose the right tool for the right job! If your application requires advanced features like retry on failure, remoting or flows, then go for Spring Batch (or an implementation of JSR352). If you don’t need all this advanced stuff; then Easy Batch can be very handy to simplify your batch application development.

The source code of this comparison can be found here.