A client of mine (a bank) is deploying her Internet branch on Weblogic 7.0sp2. We are running load tests using OpenSTA and have run into a frustrating problem: Weblogic server hangs under various loads. Sometimes it hangs while we are waiting for it to come up. Sometimes it hangs sometime after a garbage collection (mind you, not the first garbage collection run every time) has finished. It has not hanged with the JVM in debug mode and when it hangs in production mode, weblogic.Admin THREAD_DUMP also hangs, so, we have no real indicator to what may be the culprit.
Additionally, the administrative server hangs when the production server hangs but kicks back into action when we kill the production server.
The server runs on a 4-way Solaris 8 box (Sun V480) with JDK1.3.1_06 and patches recommended by the assigned BEA technician. We have optimized the OS (especially the TCP) parameters as per Weblogic documents. The box has 4GB RAM and the Weblogic server runs with 2GB of it. Usually, upto the hang, the CPU usage (as shown by sar -u 3 1000) is never more than 50%, the server throughtput is around 40-60 and the request queue is hardly ever up to 10. During the hang, CPUs go fully idle.
The application has servlets and stateles session EJBs for controllers, JSPs for views and entity EJBs for models. It has a few external systems to talk to through JDBC and CORBA. All databases are Oracle, all instances except one are 8.1.7 and up. That lone instance runs 18.104.22.168, so, we had to run Weblogic with 9i drivers instead of Weblogic supplied drivers. We use the OCI driver. All connections are XA connections.
The technician from BEA says he has seen the likes of it but cannot get any help without the thread dumps.
Got any ideas to help? I'd very much like to hear them.
Even our system used to hang under various loads.
However our problem had nothing to do with hardware.
It was due to getting too many locks to the database without properly releasing them. Even the EJBs which are used to just read data used to update the database.
By fixing the locking issue, (isModified() method) our system has become stable.
I am not too sure if you are facing a similar issue or some software/hardware configuration issue.
While I cannot vouch for the quality of the code, I have run a few checks on their JDBC code in particular and they seem to know their stuff.
They don't have read-only EBs and they use EJB2.0 CMP, I don't think isModified() needs to be fixed, but I'll put it in the checklist, anyway.
i would proceed with that fact taht it hangs on startup...
on start up couple of things happen
1: jdbc connection pools are created
2: beans are deployed
3: in case of CMP datatbse is contacted - in others there is no querying of the databse
4:startup class is envoked
perhaps u can start with answering teh following questions
1: what do u ahev in start up class- does removing some code from the startuop class make any diff ?
2: what drivers are u using ? if u use a diff driver - do u see any diff?
3: what all beans are u deploying ? can u try deploying as few beans as possible and then slowly add one bean ata time to the application to see which causues the bottle neck
4: examine datbase settings
i have seen some real wierd problems on weblogic+oracle 9.x- we once had a query which would fail if it had an order by clause. solution ? we ran analyzer on the dattabase !!!- talk to your datatbse admin to see if he can spot some potholes
There are no startup classes.
As I pointed out, they are not using Weblogic drivers because connection to an old database instance requires Oracle 9i drivers.
We have tried deploying without connections to the old Oracle instance and the related SLSBs (the legacy application involved requires the system to execute a stored procedure to queue the operation, this is done in a SLSB); the outcome wasn't different.
What puzzles me is why we don't observe the same behaviour when Weblogic runs in a debug enabled JVM.
We had a single datasource for each database. The SLSBs got connections from the same datasource (the same connection pool) for both invoking stored procedures and running queries.
We now have two datasources for each legacy database; one is reserved for stored procedure invocation and the other for queries. We no longer observe the hang. Don't ask me why.