Performance and scalability: Entity Bean Performance for high data throughput

  1. It's being suggested to me that I should use EJB's to develop a data translation integration application that will run on a single machine. I am concerned about activation/passivation/container costs on an app that doesn't require distribution, but does require fairly huge data throughput to meet non-functional specs. This application will need to perform tens of thousands of transactions per second, pumping through continuous streams of data to persistent storage. The data objects (potential entity beans) are quite large, with up to 50 fields each, some fields being up to 1000 characters long.

    Since I have read that, even locally, decent sized entity beans often take 35-50ms to become available due to activation/passivation/data access time/container costs, I can't see how I will meet performance requirements. Also, given the development model, I can't see the benefits. I know that a type 4 jdbc driver to Oracle with a simple persistence layer supporting batch updates and cursor driven large ResultSets will give me the data performance I require.

    Can anyone offer any comment or point to any benchmarks that would be relevant to this situation? Are my performance fears unfounded? 25ms per bean access would translate to 50 transactions per second, several orders of magnitude below what is required.
  2. If your transactions are inserts then the extra overhead of creating and activating entify beans will definitely adversely affect your performance.

    Any chance of using a large pool of session beans the perform your jdbc work ? That way throwing extra resources may get you an acceptable level of performance but the volumes your talking about are outside my experience so I just guessing.

    A lot of the benchmarks thrown around refer more to read scenarios than writes so I'm not sure how helpful they'll be.

  3. When you say tens of thousands of transactions, do you really mean transactions in the sense of atomic updates of multiple persistent stores? If so, our other contributor suggesting pooled Session Beans (these should be accessed using the Local interface) and JDBC has the right idea.

    If you don't mean atomic updates with rollback requirements, then I would have thought that you wouldn't need EJB at all. Javabeans/JDBC would certainly provide much better performance.
  4. Can you further clarify what you mean by "tens of thousands of transactions per second". Do you mean DB transactions or do you mean "inbound requests" basically?

    Also...another thing would be why will this be running on a single machine? This thing sounds like it's fairly mission critical. If so, how/why would it ever be on just a single machine. That sounds like a bit of an operations deathwish.