Another Performance Comparison of Middleware Architectures

Discussions

News: Another Performance Comparison of Middleware Architectures

  1. A "Performance Comparison of Middleware Architectures for Generating Dynamic Web Content" research paper was recently published in which an webapps performance characteristcs between PHP, Servlet, and Servlet/EJB implementations were tested.

    The paper concludes:

    "While Java servlets are less efficient than PHP, their ability to execute on a different machine from the Web server and their ability to perform synchronization leads to better performance when the front-end is the
    bottleneck or when there is database lock contention. EJB facilities and services come at the cost of lower performance than both PHP and Java servlets."

    Read Performance Comparison of Middleware Architectures for Generating Dynamic Web Content.

    The paper is released by some of the same people who released last years Scalability Issues with Dynamic Proxy Based Containers Report.

    Threaded Messages (34)

  2. PHP Resource contention[ Go to top ]

    I can't seem to download the report. Apparently the PHP-backend of the site hosting the paper must have some resource contention problems ;-).

    Alef
  3. Where is the paper?[ Go to top ]

    I can't find the paper... Where can I find that?
  4. Where is the paper?[ Go to top ]

    I've found it at Google. Is it the same?

    http://www.cs.rice.edu/~sameh/papers/middleware2003/middleware2003.pdf
  5. Correct Link[ Go to top ]

    Sorry about that. Here's the correct link:

    http://www.theserverside.com/resources/articles/Cecchet/Cecchet.pdf

    Regards,
    Nitin
  6. Caching[ Go to top ]

    Why isn't any caching mechanism used for the EJB method, so it doesnt have to access the db each time?
  7. arf[ Go to top ]

    yuck, they tested tomcat and jonas

    that isn't really fair now is it, it'd be interesting to have seen it run on resin or orion, resin itself claims to outperform php (though thats a pretty old benchmark)
  8. arf[ Go to top ]

    yuck, they tested tomcat and jonas


    Testing Tomcat would have been a useful exercize if they had used anything remotely close to current. Instead, they used version 3.2.4, which has been superceded by three production releases since (3.3, 4.0, 4.1), in which performance improvements are pretty substantial.

    Yah ... arf.

    >
    > that isn't really fair now is it, it'd be interesting to have seen it run on resin or orion, resin itself claims to outperform php (though thats a pretty old benchmark)

    Those numbers would be interesting as well.

    Craig McClanahan
  9. Where is the paper?[ Go to top ]

    Where is the Paper?

    Regards Sue
  10. A little dated....[ Go to top ]

    JDK 1.3.1
    Tomcat 3.2.4

    Come on, JDK 1.4.2 and Resin 2.1.10 would be an excellent combo.

    I don't see how PHP can scale, without database connection pooling.
  11. 3.2.4 == SLOW[ Go to top ]

    I agree...why do they use the OLD VM and the OLD SLOW version of Tomcat?

    Just one more reason why I am skeptical of benchmarks.
  12. database connection pooling[ Go to top ]

    PHP may not have "pooling" but it does have persistent database connections. Connection pooling is only a big win if your application spends a lot of time doing things that don't involve a database.
  13. database connection pooling[ Go to top ]

    "Connection pooling is only a big win if your application spends a lot of time doing things that don't involve a database."

    What? So I'm going to do connection pooling to something that isn't a database? The reason why you have connection pooling, is because you don't want the webserver taking the database server down. Imaging having 700 simultaneous users on the website, with PHP (and with persistent connections). You have 700 connections to the database. Creating and destroying connections is a VERY expensive operation. If you have a connection pool of say 10 connections, you can 'recycle' those connections and the database server won't go down.
  14. database connection pooling[ Go to top ]

    I was talking about database connection pooling, just like your post did.

    If you have 700 simultaneous users, and your application requires a database connection to do its work, a connection pool of 10 means that 690 users are stuck waiting at any given moment. The logical conclusion is to size your database so that it can handle the expected number of simultaneous users.

    If you had an application that spends most of its time doing heavy computation, and only hits the database on maybe 1 out of 4 requests, then connection pooling would be very useful. However, most web apps seem to use the database at least a little on nearly every request, which means that connection pooling will not be a big win. The connections will be in use whether there is a pool for them or not.
  15. database connection pooling[ Go to top ]

    700 simultaneous users does not generally imply 700 threads hitting the database at once, so a smaller number of connections can support lots of users.

    As a related point, the example application has some somewhat strange connection code. Since servlets are multithreaded, it's generally a bad idea to store things like Connection references in the servlet's fields.

    But the code has stuff like:

    <pre>
    public class ViewItem extends RubisHttpServlet {
      private Connection _conn = null;
      private Statement _stmt;
      private ResultSet _rs;

      ...

      void doGet(...)
      {
        _conn = getConnection();
        _stmt = _conn.prepareStatement
        _rs = _stmt.executeQuery();
        ...
        closeConnection();
      }
    }
    </pre>

    Unless I'm missing something or I'm clueless about threading, that code is pretty broken.
  16. database connection pooling[ Go to top ]

    \Perrin Harkins\
    If you have 700 simultaneous users, and your application requires a database connection to do its work, a connection pool of 10 means that 690 users are stuck waiting at any given moment. The logical conclusion is to size your database so that it can handle the expected number of simultaneous users.
    \Perrin Harkins\

    Web applications don't work that way. Instead, people tend to think in terms of what the Mercury people call "VUsers". The concept is simple, you analyze usage like this:

       - I have X number of users
       - Out of "X", A% (active) are expected to be online at a given time, with a peak expected load of P% (peak).
       - Out of the A% or P% of users actually on-line, only S% (simultaneous) are expected to be actively hitting the server _right now_ in a worst case scenario.

    So you might say, "out of 1000 users, only 50% (A%) are expected to be on-line at once, and out of those 50% only 10% are expected to be hitting the server right this second".

    So out of a user community of 1,000 people, only 50 might be hitting the app server simultaneously.

    You can of course fiddle with the variables A%, P%, and S%, but typically for a user community <1000 you find that far fewer than 100 are hitting the app server simultaneously. In fact, often 10-20 database connections are more than sufficient.

    This all works because web applications, when they're not sending in a request, aren't putting any load on you. At most, they may take up a bit of memory for session information, and possibly have a KEEP_ALIVE socket open to the web server. But you don't need a dedicated connection per user, because the vast majority of the times a given user isn't doing a darn thing.

        -Mike
  17. database connection pooling[ Go to top ]

    I'm aware that not all users "on" a site are active at any given time. I was assuming that the post I responded to meant it literally when he said "simultaneous users." If the users are not actually literally simultaneous, then they won't be taking up PHP processes either.
  18. I'm aware that not all users "on" a site are active at any given time. I was assuming that the post I responded to meant it literally when he said "simultaneous users." If the users are not actually literally simultaneous, then they won't be taking up PHP processes either.


    The difference is that with connection pooling, you get 10 _constantly_ open connections which are shared between different users. In PHP, AFAIK, you'll have to open each connection every time you access the DB, which is very constly in terms of time.
  19. "In PHP, AFAIK, you'll have to open each connection every time you access the DB"

    No, the connections are persistent. Once a PHP process opens a connection to the database it stays open for future use.
  20. database connection pooling[ Go to top ]

    "If you have 700 simultaneous users, and your application requires a database connection to do its work, a connection pool of 10 means that 690 users are stuck waiting at any given moment. The logical conclusion is to size your database so that it can handle the expected number of simultaneous users."

    You are correct, I would rather have 690 users wait, than have the database server struggle and server no one. Can you limit the number of total database connections in PHP? I don't think you can. Being able to limit the total number of connections is important IMO. Unless you've got money to keep throughing at to database servers.
  21. Web Performance[ Go to top ]

    My business site is completelly written in java. It has over 1500 classes and jsp pages. Performance is excellent. Database and application server are fast and resources are optimally utilized. We see a very small difference running the site on the internet and locally in our development center. Ok, the server is connected to the internet via fast lines, but I believe it is the result of good design and expert-level developers work.

    Evgeny /Javadesk/
  22. database connection pooling[ Go to top ]

    Stating the obvoius: Of course if you use a good DAO, such as iBatis (or Hibrenate) and there is no changes to the content (ex: portal), then it does not do queeries, or needs connections, since it is cached automaticaly at the DAO layer, with 0 code needed.
    (if it detects a change then it flushes)
    .V
  23. database connection pooling[ Go to top ]

    "Can you limit the number of total database connections in PHP?"

    You can limit the total number of simultaneous users active in an application at any given time. Apache provides a way to do this, which queues users who are not being serviced. Again, if your application requires database access to handle a request, then limiting the number of database connections without limiting the number of simultaneous users in the application is not very useful.
  24. A little dated....[ Go to top ]

    It's worth remembering that not everyone is at liberty to jump to the latest and greatest version of everything. Verifying that a complex enterprise application that is already be in production works on a new platform is not trivial. Let us not speak of JDK 1.3.1 as if it were punch cards it is not that old, people with real business applications can not always hop to the latest and greatest versions.

    This is why people list the versions of all technology they use. If this renders the report useless for you then that's fair enough, but it does not mean it is totally void of substance.
  25. A little dated....[ Go to top ]

    It's worth remembering that not everyone is at liberty to jump to the latest and greatest version of everything. Verifying that a complex enterprise application that is already be in production works on a new platform is not trivial. Let us not speak of JDK 1.3.1 as if it were punch cards it is not that old, people with real business applications can not always hop to the latest and greatest versions.

    >

    It is definitely correct to say that not everyone can immediately jump to latest and greatest versions of software. However, benchmark reports that claim to represent what *is* the state of technology (versus what *was*) are what gives benchmark reports in general such a bad name.


     
    > This is why people list the versions of all technology they use. If this renders the report useless for you then that's fair enough, but it does not mean it is totally void of substance.

    Oh, by the way, they used old versions of Apache and PHP as well, so I don't believe the PHP results are of much use to most readers either. A conclusion that can be reliably drawn from this report is "this is what you should have chosen three years ago." That's interesting (if you have existing applications built on these technologies). That's irrelevant (if you are using this kind of report as the basis for making future technology decisions). That's misleading (if you assume that three year old comparisons are an accurate predictor of your current best choice -- although I suppose benchmark report authors will blame the reader for that one :-).

    Ultimately, though, performance benchmarks are only relevant if performance is at or near the top of your selection criteria. In a large number of cases, "fast enough is fast enough" and you need to take other things into account as well. Of course, some of those other things aren't as easy to quantify, so we often fall back into the trap of focusing on what can be measured.

    Craig
  26. Agree with you.[ Go to top ]

    I am working with JDK 1.3.1 and don't plan to goto 1.4 until all tools that
    I depend on for devlopment & deployment have JDK 1.4 version available.
  27. questionable conclusion[ Go to top ]

    They decided that PHP is not as scalable because they were able to use two machines for the Java systems by running the servlet container and webserver on separate boxes. Any PHP site of significant size would have a load balancer in front of the webserver boxes that would allow the use of multiple machines there too. I don't think this study shows anything that we can draw conclusions about scalability from.
  28. just plain wrong[ Go to top ]

    they say that PHP is not portable between databases.
    wrong!! there are PHP Modules in PEAR (and have been there for quite some time)
    that allow database independent Code like Ruby's DBI oder Perl's DBI.

    so the guys who wrote the report never really looked at PHP.

    Java is a great language but often "scripting" language like PHP, Perl, Ruby or Python are seen as nothing more than a toy. but if the people who write such reports would look more closely they would see that these languages are a great alternative to Java/C#/VB/C++. not alway but often.

    especially Ruby and Python are very clean and have many features Java has.
    Ruby has even all the protected, private and public stuff and looks like a powerful mixture between Smalltalk, Java and some Perl.

    to find out more
    http://www.ruby-lang.org
    http://www.python.org
    http://pear.php.net/
  29. Conclusion?[ Go to top ]

    On all tests WsServlet-DB(sync)and Ws-Servlet-DB(sync) are clear winners. It is not correct to start conclusion with: "While Java servlets are less efficient than PHP, ..."

    Nebojsa
  30. Conclusion?[ Go to top ]

    The "sync" servlet tests used local locking to avoid getting locks in the database and this improved performance. This sounds like a very questionable practice since it wouldn't scale to a cluster, and takes something that is clearly a database activity (coordinating updates) out of the hands of the database. Without the local locking, PHP had better performance than servlets in terms of the amount of CPU used.

    I don't plan to start using PHP, but it does look like it did better on this test. Of course, as many have pointed out, these versions are fairly old.
  31. cheaper hosting for php[ Go to top ]

    Hi

    Ive never used php but one of the things that attracts me to it is the cost of hosting is cheap. basically if i was running a site that i didnt expect to get too many hits and wanted to keep costs down then id use php, and if the site became more successful then id switch to a java based system

    incidently i was reading earlier this week about a jsr seeking to link java and php. seems a good idea

    Martin
  32. Nice Report![ Go to top ]

    Thanx for the report!

    IMO, this report fits better to the "normal" deploying environments. Using all Open Source products was also a good decision. At least we know that those products are ready for production environment (Although I never had any doubt on this, because I myself had a very good experience with those OSS ;-)).

    What I like to have in the future:
    - Using Sun JDK 1.4.x or IBM JDK.
    - Using JOnAS 3.1/3.2. I had a much more better experience with this version than with 2.5.
    - Using Firebird 1.0.x or maybe SAPDB instead of MySQL.
    - Eliminate EntityBeans and change with Hibernate.
    - Cluster the database using c-jdbc, especially for the 1. part of the report where the database always shows the bottleneck.
    - Give all the configurations (PHP, Servlet, EJB) the same hardware capacity. So if the Ws-Serv-EJB-DB uses 4 servers, the PHP configuration should also get 4 servers like Load-Balancer-WsPHP-DB.

    Regards,
    Lofi.
    http://www.openuss.org
  33. I am a practicing Enterprise Architect. I have followed the multiple performance comparisons with mild interest, however let me humbly suggest that performance might not be the defining differentiator between middleware frameworks. It might not even be in the top 3. It alwasy comes down to system requirements, of course, but I am personally far more biased towards availability, reliability, and security rather than performace. Slight performance differences between frameworks are very minor factors compared these other criteria.

    Not that these comparisons lack merit, but I would prefer to see comparisons on the above topics receive as much bandwidth and discussions. I am guessing the lack of this discourse is due to the perceived difficulty in narrowing down the comaprisons to a number, but as we all saw in Petshop this is not even necessarily possible in performance.
  34. I strongly agree.[ Go to top ]

    Hello,

    your "humble" opinion seems very right to me.

    There are just too many people out there who don't seem to really understand how these things (middleware) work. If performance were the main diferentiator than let's all start programming in web applications in C++.

    Too much marketing and poor argumented benchmarks and articles for my taste.

    And Java is mainly about elegance and patterns. PHP doesn't even come close when it comes to this.

    As I see it the only advantage of PHP is that it's easier to learn and use (for begginers and for small tasks). It's like a "Visual Basic" for middleware.
  35. "PHP consumes less CPU time than servlets. We attribute this primarily to the fact that it executes in the same process and address space as the Web server."
    If that's the case, perhaps they run everything inside tomcat, inside of using the Tomcat-Apache combo? Everything would be execute on the web server, but according to their reasoning performance should go up.

    jdk1.3.1 for linux also allows you to use the -server option, which compiles all the code before runtime - that would probably be pretty useful (it doesn't look like they used it, it isn't enabled for a default Tomcat installation).

    Just my humble opinion.