Performance and scalability: High-volume integration to legacy apps from J2EE app
The application I support is a multi-tiered J2EE app using stateless session beans and course-grained, BMP entity beans. One of its requirements is the ability to be able to pull/push time-critical information up/downstream in a 'fast as possible' manner. We are quickly approaching critical mass with only 1/3 our install base. Are there any best practices for a J2EE app such as ours?
- Posted by: Dennis Stadler
- Posted on: June 21 2004 14:38 EDT
- runs on WebSphere 4.x; J2EE 1.2; JDK 1.3
- our shell scripts executes per a timed event per interface; they're single threaded
- our shell script creates a JVM for a main routine to delegate to an EJB to execute our interface
- through experience, we've learned that we can't execute more than 4 interfaces simultaneously else our app will experience slow response times
- our app is horizontally cloned between 2 AIX servers with 2 processors per clone
- our JVM memory never goes beyond 80% utilized
Some things we are exploring...
1. multi-threading interfaces (high risk due to 4th bullet above; possible risk mitigator would be to delegate logic to a stored procedure)
2. avoid using WebSphere's connection pool and pass a dedicated connection around
3. avoid the EJB container entirely
If anyone has solved a similar problem or has seen/heard of one, I would greatly appreciate any insight to their success. Right now, J2EE is being questioned as the right technology for our problem.
- High-volume integration to legacy apps from J2EE app by Michael Foley on June 21 2004 17:06 EDT
- High-volume integration to legacy apps from J2EE app by Dennis Stadler on June 22 2004 11:55 EDT
- High-volume integration to legacy apps from J2EE app by Paul Strack on June 23 2004 12:44 EDT
- High-volume integration to legacy apps from J2EE app by Dennis Stadler on June 23 2004 13:46 EDT
High-volume integration to legacy apps from J2EE app by Paul Strack on June 23 2004 06:50 EDT
High-volume integration to legacy apps from J2EE app by Dennis Stadler on June 24 2004 06:14 EDT
High-volume integration to legacy apps from J2EE app by Paul Strack on June 25 2004 12:42 EDT
High-volume integration to legacy apps from J2EE app by Dennis Stadler on June 25 2004 02:31 EDT
- High-volume integration to legacy apps from J2EE app by Paul Strack on June 26 2004 12:18 EDT
- High-volume integration to legacy apps from J2EE app by Dennis Stadler on June 25 2004 02:31 EDT
- High-volume integration to legacy apps from J2EE app by Paul Strack on June 25 2004 12:42 EDT
- High-volume integration to legacy apps from J2EE app by Dennis Stadler on June 24 2004 06:14 EDT
- High-volume integration to legacy apps from J2EE app by Paul Strack on June 23 2004 06:50 EDT
- High-volume integration to legacy apps from J2EE app by Dennis Stadler on June 23 2004 13:46 EDT
It might make sense to better understand the actual performance constraint on your current system. The alternative approaches you are exploring may or may not address the performance problem you are encountering.
Regarding the "as fast as possible" performance contract...you might want to begin by establishing and documenting exactly what the current performance is. How many transactions or whatever and in exactly what configuration. Then as you make changes, you will have a baseline to compare against.
For understanding current performance...you may want to profile one transactions and gain a detailed understanding of exactly where the issues are. What classes/methods/lines are using the most CPU? Are there for sure synchronization or threading issues that are contraining performance? Good tools for this are JProbe and OptimizeIt...and if memory serves me I think WAS has something built in?
Also...you may not want to jump right at programmatic solutions. There may be configuration changes that can help once you understand the exact nature of the bottleneck. And...sometimes programming changes are expensive...who knows...maybe even adding hardware...another node, vertical cloning, another AIX box, another CPU, or more memory is the cheaper way to buy performance.
It sounds like you need to take a look at where your bottleneck stems from - specifically which method is constraining your resources.
JView 2004 can help you pinpoint the exact method causing the issue, whether it is simply a time consuming method, or a high CPU utilization method.
There's a 20-day trial at http://www.devstream.com
Thank you Michael/Mario for your prompt responses. To comment on some of your feedback, we are currently trying to run our app through a performance-monitoring tool. We're hoping to have it completed within the next several weeks. Until then, we are dependent on our custom logs.
Here's some further elaboration of our architecture...
1. transactions are managed by our EJBs; each interface has a minimum of 2 EJBs (1 that only controls the flow, the other to manage transactions; other EJBs may be called if they perform work the interface requires)
2. our EJBs for interfaces are part of the same EAR our on-line users interact with; there's only so much tuning we can do until we split our batch interfaces as a separate EAR
We know there are a lot of opportunities with our code which can be addressed through refactoring and/or more hardware. But even if we have these processes executing inside of several seconds, it still won't be fast enough once we hit our full install base. Multi-threading looks inevitable, but dangerous.
We can continue to fine-tune what we have, my fear is that this design is fundamentally flawed. I've done a fair amount of research and I'm not finding any recommendations/best practices/proven alternatives.
I agree with the suggestion of using a profiler, but my best guess is that the problem is with your Entity beans. Entity beans perform very poorly for batch-style, transactional operations. This seems to be what your system is being used for.
The problem lies with EJB pooling. An Entity bean cannot be put back into the pool if it is currently being used in a transaction. For example, if you have an Entity pool of 400 beans, none of your system transactions will be able to use more than this number of beans concurrently. If your transactions use about 100 beans each, that could be the reason why you are limited to 4 simultaneous operations.
A short-term fix is to increase your Entity bean pool size. A long term fix is to profile your application, determine where the code is having performance problems, and selectively replace the troublesome code with optimized JDBC or stored procedures.
Thanks, Paul for your comments. I know I've asked this question to our administrators before, but they were unsure how to check for this. Are you aware of where this can be validated? Is it on the WebSphere console or is it defined within the EAR file?
It's been a long time since I used Websphere 4, so I can't remember what the admin console looks like. There is something in the Websphere console that tells you what your Entity pool size is. I believe you can also monitor the usage of Entities in transactions, but I am not 100% sure of that (that may be Websphere 5).
As for defining the pool size, it is a value you can set when you configure the EAR file for your EJBs, either using the AAT or WSAD. You may also be able to control it from the console (again, I can't remember).
Just to be clear, this is an Entity bean problem, not a Websphere problem. I have seen this issue come up on a number of different J2EE servers.
I found this link, link name. It appears IBM may have made this hard to find for a reason(?).
Thanks for your help, I'll follow-up with our administrators on this.
Still, I have not heard or seen a documented best practices for high-volume, batch processes in a J2EE environment. Am I doing something that unique in J2EE?
Still, I have not heard or seen a documented best practices for high-volume, batch processes in a J2EE environment. Am I doing something that unique in J2EE?Well, I have seen these same sorts of issues several times now, so at least you are making a common mistake :)
Seriously, though, Entity beans (and most Object-to-Relational frameworks) are built for online, data-entry-oriented applications, where users only deal with small datasets. The optimizations for EJBs assume are design to help these kinds of applications: data caching, object pools, streamline transaction management.
When you try to use Entity beans for batch-style operations working with large data sets, these same optimizations become a big hindrance. The most common solution here is to avoid using Entites for batch-style operations, and rely on more traditional methods instead: JDBC, optimized SQL and/or stored procedures.
If you need need to use EJB (for example, you want to initiate your batch process remotely), you can invoke your JDBC/store procedure from a session bean, or between yet a message-driven bean (partially supported on Websphere 4).
I guess that is what I'm looking for. Validation that a purest approach to batch processing with J2EE is not practical. We would like to get to an end-state where our session beans demarcate transactions but delegate most of the work to stored procedures. Further, EJBs should help us facilitate multi-threading from a non-EJB client. The heart of my question was to determine if I wasn't considering a different architecture which doesn't abandon so much of J2EE.
If my on-line application uses entity beans or another persistence layer (Hibernate, JDOs) but my batch processes goes against the DB directly, how does my persistence layer become aware of changes to the underlying data? Is caching even an option for me anymore?
If my on-line application uses entity beans or another persistence layer (Hibernate, JDOs) but my batch processes goes against the DB directly, how does my persistence layer become aware of changes to the underlying data? Is caching even an option for me anymore?Well ... that is the price you pay for the high throughput of store procedures. Your data caching becomes much, much more complicated. Your options are:
1) Give up entirely on caching, or only cache for very short periods of time (a single interaction with the user).
2) Build in some intelligence into your store procedure invocations, so that it also sends a signal to flush your cache.
Options (2) can be effective if your store procedures run rarely (say, once per day). Otherwise, probably option (1) is your best bet, and you can focus on making your read/write operations to the database as effecient as possible.