Long-running external batch process integration with EJB?


EJB design: Long-running external batch process integration with EJB?

  1. We are building an EJB application to handle (among other things) list processing. (Think large mailing lists.)

    There is an external application which we want to use to do much of the heavy lifting for the list processing.

    We are structuring our application internally using MDBs to handle most work requests (so the work that we want to do inside the system can be scaled, monitored, etc. using standard JMS patterns).

    Being EJB novices, we are running into a fairly specific problem and are having a hard time finding the Correct Solution. Here's the problem: what is the Right Way to launch the external application from within our EJB system?

    Consider the following scenario:
    - User requests a list to be processed.
    - Web tier sends a message "process this list".
    - MDB receives this message and starts up the external application....

    ...and now that MDB instance is tied up waiting for a long, long time for the external application to complete.

    The knee-jerk traditional Java thing to do here is to have the MDB instance create a thread to monitor the external application. This thread could then send status messages as the external app makes progress, and could report completion when the external app is done.

    But creating threads from inside an EJB container is a no-no.

    What is the right thing? Should we essentially, instead of creating a thread, just post "check external app status" messages to ourselves? How would we throttle those messages to be processed (say) only once a second or so?

    Basically, EJB containers seem oriented towards handling lots of small tasks with few timing constraints, rather than handling long-running (possibly external) batch processes. We would like to avoid writing external Java wrapper applications outside the EJB container to manage these external batch processes, but might that be our only choice here? (ugh!)

    All guidance appreciated.
  2. If the external application can notify about its process completion then same can be posted in a topic/queue which any other MDB can subscribe to and accordingly update the status in the current application.

  3. we have something like this. A long usecase is running and while it's running the user can see the progress of the process. We use Message Driven Beans which send messages to a queue containing a pair of values:

      you could use something like this too: knowing the number of records inside the list you cand send once in a while a message to a message queue containing the index of the current record being processed.

      I know it's not a great solution but at least it's a solution ;))

  4. Oops, one question for Sergiu: you say you "have a long usecase which is running". Running where? Inside an MDB? In an external application? Where exactly does this long-running processing happen in your system?

  5. Sergiu,
    I don't understood your solution... I'm having a similar problem and need ideas :)
    How your solution really works???
    I try figure out this:
    Your client begins a long transaction from a web interface, this transaction is implemented in a MDB method, this method saves progress information messages to CMP???
    The progress information is acessed by other web page quering the CMP...
    Is your solution something like that???
  6. Ok guys, let me give you more details on the case. I said that I have a long usecase running. I have two tables, table_a and table_b. The records in table_a have two fields: dateStart and dateEnd. For each record in table_a I read the dateStart and dateEnd and I write a record for each day between dateStart and dateEnd. Imagine having 100 records inside table_a and each record having dateStart=01/ian/2004 and dateEnd=01/jul/2004. This means 6 months each with an average of 30 days, so for each of the 100 records in table_a I have to write 180 records in table_b. You can imagine that I have to write a total of 18000 records, so this would be a really long time consuming process. The process is started inside a Business Delegate which reads all the needed records from table_a and for each record a session bean starts a transaction and writes the records in table_b.
      When calling this usecase from the FE in order to notify the user about the progress we use MDBs as I have described previously.
      Sorry for the delay in my response, but it was a great weekend ;)).

      Wish you well,
  7. What is your Business Delegate? Is it a long-running MDB? Is it an external Java process? Is it a separate client application? What kind of Java executable is it and where inside or outside your container is it running?

    I hope I am not too insanely dense, but I'm just not understanding exactly what it is :-)

  8. My Business Delegate is the exact implementation of the Business Delegate pattern. It is a simple java class which delegates the calls from the client to the session beans. It's main purpose is to hide the EJB access logic (setting up the initial context, looking up the home interface, creating the remote interface) from the client. If you are unclear with the business delegate pattern you should look for Floyd Marinescu's "EJB Design Patterns". You can find it on this site in PDF format. I cannot give you the exact link because with the changes they recently made the links are not working anymore. But it's here somewhere, you just have to look for it.

  9. OK, so your app server structure is:

    Client webapp -> business delegate class (POJO) ->->-> session beans

    How is your webapp invoking the business delegate? Directly via being part of the same app server? In other words, is your business delegate really a class running in the servlet container? If so, how do you handle returning a response to the user even while the business delegate continues its processing? It seems like you're tying up a servlet thread for the duration of the business delegate's operation... true?

    I guess here's the real question I am asking: what thread is the business delegate running on? A servlet container thread? An EJB container thread (apparently not)? A thread you create yourself inside the servlet container?

    Thanks for being so patient with my trying to get to the bottom of what I am confused about :-)
  10. Post your email here and I will send you an educational project I made about EJB Design Patterns and Struts as an interface. In this you will be able to see how I call Business Delegates, Session Beans, Message Driven Beans from the web tier.
  11. OK, it is robj at nimblefish dot com. However I hope I don't have to dig through too much code to answer my one remaining question, which was in my last post, and which I repeat now: what thread is your business delegate running on, and how does it respond quickly to the UI request while still performing its long-running processing?

  12. dear sergiu,
    would you please email me the educational project you mentioned above?
    thank you very much.
    bawanglong_qiqi at 163 dot com
    [email protected]
  13. Rob -
    To be quite honest I would not spawn any processes from an MDB. Think of an MDB like a Stateless Session Bean. (Also verify in the J2EE spec, I think this is a no-no.) You shouldn't tie up the MDB instance for that long, nor should you have it interact with system resources, like calling an external application (it's a server component and therefore can be placed anywhere, presumably on a server that doesn't have access to the external application). This is what I'd do, and keep in mind it's just an opinion, please take it with a grain of salt... :)

    Using an MDB is a good thing, keep going down the JMS path. If you have several consumers interested in the message, send it to a durable topic. If there is only one consumer send it to a queue. Create a client application that listens to the destination of choice and interacts with the external application. This benefits you by distributing the interaction with the external application (you can isolate the activity and scale more effectively). The client app can also send a message back via JMS to elude to its completion or failure.

    Anyway, just my $.02 -
    Sounds like a fun project, hope you find this helpful.
    Later -
  14. Reading between the lines of your collective responses (thanks to all you who answered!), it sounds like the general recommendation is to manage the batch-processing application in some external Java wrapper application that can interact via JMS. Then we keep our internal system logic in MDBs.

    So the pattern is an external Java app which responds to JMS "start batch" messages from inside our EJB system, and which sends JMS "status update" messages back into the system. Then we have (inside our EJB container) stateless session beans which handle the job-starting requests, and MDBs which handle the status update messages and which update our internal job tracking database tables. So the external app never touches the job tracking database, and only interacts via JMS; and the EJB server never directly touches the external app.

    I guess on some level this is the way it would have to be, as the batch processing can't really be made part of the J2EE clustering/failover/etc. system. Oh well. I'm sorry to have to consider deploying and managing another whole application (it's nice to have the whole system inside the EJB container), but looks like it's a bullet we'll have to bite. At least JMS will enable us to keep a uniform communication structure in the system.

    We're very intrigued by IBM's "asynchronous beans" research in Websphere, but we'll wait to tackle that until it's a bit more widely deployed.... (Google for "IBM asynchronous beans" -- we can really use that kind of capability, as it sounds perfect for this problem!)