Dear Forum Members,

I have been handed a J2EE application project that performed well in unit testing but failed in implementation and asked to find out what was wrong with it and what can be done to make it operational. After spending some time looking through the code and documentation of the legacy system (EIS) it is supposed to interoperate with, I find I am now torn between two recommendations: one is to do further, intensive testing to identify the offending parts of the code that cause memory leaks or reimplementing the application to leverage more of the J2EE framework.

Here is the scenario:

The customer has a legacy warehouse inventory system which uses hand-scanners that communicate with a server application hosted on a mainframe. The communication protocol is proprietary and bi-directional, meaning that the server can send arbitrary commands to the hand-scanners as well as receive them. The socket connection opened between each individual hand scanner maintains any state between them. In order to migrate the system to an IP-based communication protocol, a gateway J2EE server has been implmented to take incoming requests from newer hand scanners and forward their requests on to the legacy host system.

The new J2EE application has two major components: a web service and a business backend implemented in an MBean. The web service itself doesn't listen in on requests but instead invokes classes embedded in the Mbean that opens a server socket and listens for connections. Those connections are then stored in a HashMap and deleted from the map on expiry of the session or on termination of the connection as per the legacy protocol. The J2EE gateway server then needs to open serial port connection with the EIS (the mainframe) and forward data received from the hand-scanner on to the mainframe application. Some additional processing of the data is done at the J2EE gateway itself before forwarding the request to the mainframe. The two socket connections (one pair per scanner) is kept open, and the J2EE application reroutes them accordingly.

Some salient features of the J2EE application:

  • The web service is only for initializing the objects that initiate socket connections between the hand-scanners and the J2EE gateway.
  • A server socket is opened in the Mbean to listen for connections from the hand-scanners.
  • The server socket listening to connections from the hand-scanners is explicitly threaded (i.e. it extends java.lang.Runnable, performes necessary synchronization on shared objects, etc.).
  • The serial connection to the mainframe is also threaded (i.e. more than one serial connection can be opened to the mainframe to process requests between the mainframe and the scanners).
  • The application is responsible for managing concurrency and memory resources.
  • No servlets or EJBs are used.

Here is my take on this: Whoever wrote this could have used a non-managed environment; they could have made this a stand-alone application without the overhead of hosting this on a J2EE application server. But it is nevertheless deployed there, and it would probably take much longer to yank it out of its J2EE moorings than to fix it.

That said, when this application was put into testing, where real people with real handscanners tried it out, the memory on it quickly maxed out and brought the server down. My own explanation for this looking at the code is that they tried to manage their own memory and concurrency, and they ended up doing a poor job of it. And it is likely that there is little that can be done to improve it.

However, getting the server working as quickly as possible is of critical importance, so I am torn between two paths:

  1. Quick fix: duplicate or simulate conditions under which the server was brought to its knees, profile its memory usage, and try to identify and fix the collection classes that are holding onto references. As of now, there are simply too many potential places where objects are holding onto references of each other to say for sure, looking at the code, where the memmory leaks are.
  2. Reimplimentation: This is cutting the Gordian Knot. Simply conclude that the present design will be unreliable no matter what and try to move as much of the business logic into Stateful EJBs and manage the connection with the EIS through JCA connectors. That way the job of managing memory and threading is placed more onto the J2EE app server rather than trying to manage our own memory and concurrency.

If I go with option 1, it is risky because if the memory problems are not trivial, then it won't end up working. if I go with 2, we might not get the server working in time, or within budget, and then (in hindsight), option 1 could be said to have worked.

Any comments or suggestions about how to deal with this?

I shall be obliged.