Discussions

News: Scalability Issues with Dynamic Proxy Based Containers Report

  1. Rice University has released an article on the combined effect of application implementation method, container design, and efficiency of communication layers on the performance scalability. They used JBoss and JOnAS open source EJB containers for their evaluation.

    Full article available at http://www.cs.rice.edu/CS/Systems/DynaServer/perf_scalability_ejb.pdf
     
    Web site with full experiment reports is http://www.cs.rice.edu/CS/Systems/DynaServer/

    Here is the article abstract:
    "We investigate the combined effect of application implementation method, container design, and efficiency of communication layers on the performance scalability of J2EE application servers by detailed measurement and profiling of an auction site server.
     
    We have implemented three versions of the auction site. The first version uses stateless session beans with bean-managed persistence, making only minimal use of the services provided by the Enterprise JavaBeans (EJB) container. The second version makes extensive use of the container services using entity beans with container-managed persistence. The third version applies the session façade pattern, using session beans as a façade to access entity beans. We evaluate these different implementations on two popular open- source EJB containers with orthogonal designs. JBoss uses dynamic proxies to generate the container classes at run time, making an extensive use of reflection. JOnAS pre-complies classes during deployment, minimizing the use of reflection at run time. We also evaluate the communication optimizations provided by each of these EJB containers.

    The most important factor in determining performance is the implementation method. EJB applications with session beans perform as well as a Java servlet implementation and an order-of-magnitude better than most of the implementations based on entity beans. Use of session façade beans improves performance for entity beans, but only if local communication is very efficient. Otherwise, session façade beans degrade performance.
    For the implementation using session beans, communication cost forms the major component of the execution time on the EJB server. The design of the container has little effect on performance. For implementations using session façade beans, local communication cost is critically important. With entity beans, the design of the container becomes important as well. In particular, the cost of reflection affects performance."

    Threaded Messages (17)

  2. Has anyone done anything similar to EJB 2.0 implementations especially using local interfaces and the session facade?

    By the looks of things it seems communication is a major factor and using local interfaces for entity beans and stateless session beans as facades will get around a lot of the scalability issues.

    JBoss 3.0 BETA 2 seems to have the above mentioned features.
  3. They used JBoss optimized calls and JOnAS-Jeremie, both of which are similar optimizations to local interfaces (according to the paper). What I found interesting was the advertised line counts:

    Servlets only: 4590
    Session beans: 7920
    EB CMP: 11320
    Session facade: 13440

    As servlets had the best performance too, this seems to strengthen the claim that servlets indeed are superior in smaller website projects where you don't really need all the features of EJBs.
  4. I didn't read the article yet, but reflection has improved a lot in 1.4, see for some performance numbers:

    http://neotis.de/javangelist/wiki-view?oid=4B8180808080808080808080808085neotis

    bye
    -stephan
  5. Check the implementation of the version which uses stateless session beans with bean-managed persistence. It's horrible.
    If you call a method like "getCategories", the methods returns a String which contains directly HTML code which renders the list of categories !
    Of course, this implemention will beat any other one but it is not maintainable.
    Everybody knows that If you write unmaintable code, you generally obtain better performances.
  6. some of their conclusions were very interesting in comparing PHP, servlets, and EJB...especially the "Under all circumstances, EJB is considerably slower..."
    I question this somewhat because of their design. Based on what I picked up scanning thru their document they used entity beans a lot and I wonder how well the app servers would perform without the use of the entity beans and instead use DAO's.

    also, did anyone notice if they created their own connection pooling mechanism when doing the servlet implementation? i haven't looked at the code yet.
  7. Interesting study. Couple of observations -

    1. I am not sure if they did use fine-grained access for their 'DAO EB CMP' design alternative or not - in section 3.1.2 they mention they used bulk accessors to avoid fine-grained access whereas in the summary, in section 6.4, they conclude that DAO EB CMP separation gives least scalable results because of excessive fine-grained access.

    2. Assuming they extended the 'DAO EB CMP' design by wrapping up the data access in the session bean facade, how did they design their session bean method accessor(s) for the entity bean attributes, especially for the read-only operations? If these are not bulk calls scoped in one transaction then each getter operation would result in a pair of ejbLoad/ejbStore calls. Even if these are local calls, the overhead for these extra calls can be quite high.

    And of course, it would have helped, if Jonas or JBoss had read-only EB optimizations ;) It would also have been interesting to throw in the mix the ability for optimized data loading strategies that 2.0 CMP allows container vendors to do, e.g., lazy loading, loading of frequently used field groups etc.

    Hope they would study a 2.0 CMP implementation soon.

    -Satadru
  8. "If you call a method like "getCategories", the methods returns a String which contains directly HTML code which renders the list of categories ! "

    Yes, this is a deliberate choice. We wanted to reduce as much as possible the interactions between the servlets and the beans in this version. What appears in this study is that there is a tradeoff between performance and "beauty of design". Having a nice maintainable code like the EB or Session fa├žade versions has an extra cost.
    Even if "Everybody knows that If you write unmaintable code, you generally obtain better performances.", I am not so sure that everybody is convinced that EJB can perform as well as Servlets with a code that is at least not less unmaintainable.
  9. WOW. jonas is ruling jboss.
  10. Just as a side-note: You can definitely write unmaintainable code that performs badly. But for all practical situations only maintainable code scales and performs in the long run.

    Writing a super-optimized spagetthi system in machine code that outperforms a maintable system running on a Sun Starfire just on a pocket calculator is possible as a research toy-project. In practical life (where hopefully we end up sooner or later) it's impossible to optimize a very large system built using bad engineering practices. The Session facade is one of the more important patterns for establishing such an architecture making it possible to optimize bottle-neck services without changing the applications built on top of the services.
  11. jdk 1.4 improves reflection performnce to only 2x the speed of a regular method call... thos may improve jboss' standing, tho i cant read the article because I'm on my Zaurus.. hey has anyone found a port of xpdf for this thing?

    Stu
  12. We have added a postscript version of the paper on the Web site.
  13. Emmanuel

    First of all, nice job. Some comments/questions.

    Where are the times in which you are blocking for DB results attributed in the "execution breakdown". Are they in 'communication' or are they subtracted out somehow?

    Kind of a nit, but it threw me at first: Your terminology is a little confusing. In general, BMP refers to BMP Entity Beans, which is not what you describe in 3.1.1. And DAO generally is its own beast - not the same as CMP (3.1.2), though sometimes implemented with BMP.

    It would be interesting to compare BMP EB and CMP EB performance across servers, to gauge the container optimizations in CMP over BMP counterparts, using EJB 2.0 of course.

    Though it would break the Open Source theme, it would be cool to see how a commercial product fares.


    Mike
  14. "Where are the times in which you are blocking for DB results attributed in the "execution breakdown".
    Are they in 'communication' or are they subtracted out somehow?"
    What is present in 'communication' is the real communication time spent to send/receive data. The blocking for DB results does not use cpu time (it is not a spinlock) and is overlapped with the processing of the other requests.

    "Your terminology is a little confusing."
    Yes, I agree with you, we shouldn't talk about BMP with stateless session beans (it has really no meaning in this case). This was just to underline that SQL requests are written by hand in the code vs generated by the container in the CMP case.

    I also agree that the EB BMP vs CMP is an interesting comparison and we plan to do it.

    "Though it would break the Open Source theme, it would be cool to see how a commercial product fares."
    I have learned today that some people have started to port RUBiS on WebLogic. I hope that they will get results soon so that we can share them with the community.

    Emmanuel
  15. Emmanuel, I think this is solid research and definitely a good start to get quantitative data around what type of J2EE architecture people should be using in real-world projects. Some comments though...

    I would agree with Mike Finn's comment. The current usage of the terms BMP and DAO are not so much "confusing" as they are "wrong" :) I am pretty curious as to how BMP compare to CMP performance-wise, but this paper doesn't address it.

    Another point that maybe other J2EE folks can confirm/refute. One point is that you mention using Session BMP, an advantage is that you get connection pooling, whereas with Servlets you have to write the pooling yourself. I think this is wrong; you can still use the DataSource in servlets identical to how you used it in your session BMP example, e.g. http://www-4.ibm.com/software/webservers/appserv/doc/v35/ae/infocenter/was/040204020102.html.

    Also, you use the term scalability interchangably with performance. From the context of a J2EE application, scalability means the "ability to economically support the required quality of service as load increases"; i.e. adding servers or CPUs or RAM. This is supposedly the point of EJB - as your needs increase, you can add more boxes and your problems go away; this is something you didn't test. Of course, whether this works or not is debatable... This seems like something you could test by adding another box to the mix and then seeing how the performance pans out.
  16. "with Servlets you have to write the pooling yourself. I think this is wrong; you can still use the DataSource in servlets identical to how you used it in your session BMP example"

    I agree that the Web container should provide connexion pooling but we have not been able to find such a support in the version of Tomcat we used. In the Servlets implementation, we directly deal with the MySQL JDBC driver and this driver does not implement connection pooling.
    The reference you give on IBM site only works with servlets in front of EJBs and in this case we do what they propose. If you have servlets only, someone should provide the JNDI and the datasource but Tomcat doesn't as far as I know.


    "This seems like something you could test by adding another box to the mix and then seeing how the performance pans out."

    Yes, but if you add a node, either your server supports clustering (which is not yet the case) or you statically distribute your beans. We tested this last case, but the load is not well balanced between the different beans. Therefore, it is hard to find a good partition of the beans that provides an interesting speedup. Only dynamic load balancing at the server side can improve things. But it is part of our plans to evaluate clustering features of EJB containers as soon as they become available (work in progress for both JBoss and JOnAS).

    Emmanuel
  17. As far as I can tell, dynamic proxies fall short for CMP 2.0 (does JBoss still use dynamic proxies for CMP 2.0?).

    I am also curious to see a similar report for "enterprise class" application servers: production database (not MySQL :-)), CMP 2.0, clustering and local interfaces. I think it will be soon the most common scenario in enterprise applications.

    --
    Cedric
  18. Figure 12 says it all. In all 4 cases more than 90% of the execution time was spent on fine-grain ejb communication and fine grain database access.

    Local interfaces address the communications overhead issue. EJBQL and relationships fix the fine grain database access issues.

    Hopefully a new version of this paper will report execution times for a ejb 2.0 implementation. This paper was an excellent quantification of the fact that fine grain entity beans simply don't work well. Fortunately relationships and local interfaces solve this problem.

    I would like to see weblogic 6.1 thrown into the mix. If the next version of the paper would report on the benefits of read-only entity beans and invalidating caches that would be great.