Discussions

General J2EE: Batch Processing in J2EE - strategies, pros & cons

  1. Anyone developing production Batch processes in J2EE these days? Cause I'm having to build lots and there isnt much information about.

    Would like feedback on the appoaches outlined in:

    http://www.devx.com/Java/Article/20791


    Any comments on Scalability? Recoverability? etc greatly appreciated.
  2. One "gotcha" I have seen for J2EE-based batch processing is using Entity beans. In other words, don't. Entities are optimizations for caching and small transactions, and these optimizations break down in large batch jobs. Using JDBC for better control (or maybe a JDBC-based ORM tool like JDO/Hibernate).

    As for the rest of it ... I have to admit it is outside my area of expertise.
  3. One "gotcha" I have seen for J2EE-based batch processing is using Entity beans. In other words, don't. Entities are optimizations for caching and small transactions, and these optimizations break down in large batch jobs. Using JDBC for better control (or maybe a JDBC-based ORM tool like JDO/Hibernate).

    I cant comment on Hibernate or other persistence mechanisms under load but in my experience any sort of caching of the current operating data (whether it be entities or not) is wasted. This is cause Batch processes tend to process once and forget. A particular piece of data is never revisited so caching doesnt add any value.

    Call centres have the same problem. High rate of calls, little or no common data from call to call.

    For Batch processing I've tended to use pure JDBC so that I can use perf. optimisations like Batched Updates. etc.
  4. By enabling the J2EE processes (session beans) as web services, you can pretty easily invoke them from the command line. There a situations where I have used Windows Scheduler to invoke J2EE-based batch processes.
  5. Any Bean can be invoked from the command line and I dont need WS to execute a bean. Given that the problem is not to interroperate or distribute across different technologies and systems but rather to scale on one platform (J2EE), I dont understand what WS provides or perhaps what your proposing.

    As a side note, Web services are fairly verbose, they arent transactional (i.e guaranteed) and specifications from Oasis and W3C like WS-ReliableMessaging are yet to be ratified.
  6. Cons[ Go to top ]

    I programmed batch processes with J2SE and used the operating system for control (CRON jobs, Windows Scheduler, or whatever your OS supports) and this worked fine.

    I don't think the J2EE servers are ready for batch processing yet. I played around with using J2EE for batch a little two years ago and had all sorts of problems. This article didn't mention a lot of practical problems such as:

    - EJB pools are not sized correctly for batch processing. On some app servers min and max number of objects are just suggestions. In batch processing max number of batches processing at a time need to be hard limits.
    - EJB's passivate at the wrong intervals for batch processing and passivation can not be shut off.
    - Entity beans are not efficient for batch.
    - Alarm beans are only implemented on the newest versions of some servers.
    - Memory constraints on session beans could be problematic for large batch processes.

    One thing that bugged me about this article assumed that all of your batch processes could be placed into tiny little subprocesses. In high throughput batch processing this is almost never done because you want to achieve maximum throughput (it can never go fast enough) and the quickest method is always to leave everything in one process. What they really seemed to be describing is a messaging architecture which is similar to batch processing but is still somewhat different.

    The main issue I see is that application servers assume an online environment and until the vendors (bea and ibm) put out a version of their server specifically for batch the online environment assumption will get in the way. This article had a lot of nice points but its all theory until one of the vendors supports it.