Discussions

News: The Performance Paradox of the JVM: Why More Hardware Means More Failures

  1. As computer hardware gets cheaper and faster, administrators managing Java based servers are frequently encountering serious problems when managing their runtime environments.

    While our servers are getting decked out with faster and faster hardware, the Java Virtual Machines (JVMs) that are running on them can't effectively leverage the extra hardware without hitting a wall and temporarily freezing. Sometimes it's a ten second freeze, while other times it's ten minutes, but every time, it's enough to frustrate users, cause retail sites to lose customers, cause business sites to start fielding problem calls at their help desks, and cause server-side administrators to become increasingly frustrated with the Java based environment they are tasked with managing.

    So, what's the problem? What's causing all of these JVMs to pause? 

    Find out about the performance paradox of the Java Virtual Machine, and what the open source community is doing to solve the problem:

    Tackling the Performance Paradox of the Java Virtual Machine by Cameron McKenzie

  2. Here is an interesting Cameron Purdy's post: The challenge with GC in Java and .NET

     

    It says that, while GC algorithms have improved in the past:

    (1) we are facing more and more a wall these days ("A series of amazing advances in GC algorithms have thus far masked this inevitable consequence, but the advances are already showing diminishing returns, while the upward pressure on the size of the store has not abated and the dramatic progress of processor performance has not resumed")

    and

    (2) the core part of GC is sequential ("While GC algorithms have advanced dramatically in terms of parallelism, the remaining non-parallelized (and possibly non-parallelizable) portion of GC is executed as if by a single thread")

    So, while the treated data volume is increasing with more and more cores, the GC impact increases too, and the JVM freezes more and more, blocking an increasing (too!) number of cores!

     

    Are GC-ed languages, like Java, good enough for big volumes of data with multiple cores ?

    Or, are we going to see new languages due to these hardware constraints ?

    ...

     

    Dominique

    http://www.jroller.com/dmdevito

  3. Great comment. And going even beyond multiple cores, to the whole cloud, it becomes an even greater concern. I'd be interested to hear Cameron Purdy's take on the managed runtime initiative, and see what type of work Oracle is doing in that particular space.

  4. You need to take better care of your reputation.  They've got very good stuff, for a niche, but it is distasteful to find them being plugged so often and this time in a fairly sneaky way.

  5. Azul Was a great Reference[ Go to top ]

    The guys at Azul were a great reference for the article, and it's a real problem. And while Azul is behind the open source project, it's just that, open source, and its taking the industry in the right direction.

  6. So for now I see four kinds of solutions for this:

    1. Invent a GC algorithm that is (almost) fully parallelizable (maybe easier said than done, but until it's actually proven something must be serial, this seems like the best solution).
    2. Start partitioning your app into several cooperating processes (vertical partitioning) 
    3. Start partitioning your app into several independent processes (horizontal partitioning)
    4. Bring some concepts of Java RT into the mainstream -> for some memory intensive operations, dish GC and manage memory yourself C++ style.

    Of course there are some combinations possible. You can easily combine 2 en 3 to create N x M partitions. Some join/fork algorithms would also be suitable for processing by multiple processes, although work stealing would become a little more expensive. Basically you would be looking at how you deal with computations in a traditional grid, but instead of nodes being physically different machines they are different processes on a many core machine.

    If your memory is mainly taken up by many concurrent users, it might be easier. If you have a single machine that serves 500 users, split it up into two virtual machines that each serve 250 users.

    Managing memory yourself again would not be suitable for a typical request/response crud style app with many users, but would apply for some special operations that are notorious to use up a lot of memory (like e.g. some periodic filtering for a data feed or so).

     

     

     

  7. I think your latest item

    Bring some concepts of Java RT into the mainstream -> for some memory intensive operations, dish GC and manage memory yourself C++ style.

    is most important thing that should be introduced in Java. Now GC has to handle all objects in memory, but if you say do not touch it I will manage it myself, it will increase GC performance significantly, especially in case of CMS on multi-core CPUs

    Imaging in-memory database that are not changed frequently, GC has to handle all these objects in memory since it is its responsibility. It consumes CPU, it stops application (in case of full GC), actually, CMS is not able to compact memory at all if there are a lot of objects in memory 

  8. Bring some concepts of Java RT into the mainstream -> for some memory intensive operations, dish GC and manage memory yourself C++ style.

    is most important thing that should be introduced in Java. 

    It's indeed a very interesting topic. Do remember that basically "Java" already has this. Java RT is a full and official member of the Java family. Take a look at this: http://java.sun.com/developer/technicalArticles/Programming/rt_pt1/

    and specially this:

     

    Memory Areas

    The RTSJ provides for several means of allocating objects, depending on the nature of the task doing the allocation. Objects can be allocated from a specific memory area, and different memory areas have different GC characteristics and allocation limits.

    • Standard heap. Just like standard VMs, real-time VMs maintain a garbage-collected heap, which can be used by both standard (JLT) and RTT threads. Allocating from the standard heap may subject a thread to GC pauses. Several different GC technologies can balance throughput, scalability, determinism, and memory size. NHRTs cannot use the standard heap in order to be protected from this balancing, which is a source of unpredictability. 
       
    • Immortal memory. Immortal memory is a non-garbage-collected area of memory. Once an object is allocated from immortal memory, the memory used by that object will never, in principle, be reclaimed. The primary use for immortal memory is so that activities can avoid dynamic allocation by statically allocating all the memory they need ahead of time, and managing it themselves. 
       
      Managing immortal memory requires greater care than managing memory allocated from the standard heap, because if immortal objects are leaked by the application, they will not normally be reclaimed. In this case, there is no easy way to return memory to the immortal memory area. Note: If you are familiar with C/C++, you will probably recognize immortal memory as being similar to malloc() and free(). 
       
    • Scoped memory. The RTSJ provides a third mechanism for allocation called scoped memory, which is available only to RTT and NHRT threads. Scoped-memory areas are intended for objects with a known lifetime, such as temporary objects created during the processing of a task. Like immortal memory, scoped-memory areas are uncollected, but the difference is that the entire scoped-memory area can be reclaimed at the end of its lifetime, such as when the task finishes. 
       
      Scoped-memory areas also provide a form of memory budgeting. The maximum size of the scoped area is specified when it is created, and if you attempt to exceed the maximum, an OutOfMemoryError is thrown. This ensures that a rogue task does not consume all the memory and thereby starve other -- perhaps higher-priority -- tasks of memory.

    So there you have it, immortal memory and scoped memory are both concepts that already have beed defined for Java the language and the JVM. Now 'all it needs' is to bring this into the main stream. Easier said than done probably, but still ;)

  9. So for now I see four kinds of solutions for this:

    1. Invent a GC algorithm that is (almost) fully parallelizable (maybe easier said than done, but until it's actually proven something must be serial, this seems like the best solution).
    2. Start partitioning your app into several cooperating processes (vertical partitioning) 
    3. Start partitioning your app into several independent processes (horizontal partitioning)
    4. Bring some concepts of Java RT into the mainstream -> for some memory intensive operations, dish GC and manage memory yourself C++ style.

     

    5. Use stateles architecture for your server applications and most of your objects will be always young. It will help for 3 too (load balancing)

     

  10. Sun's JVM has various -XX gc options, including several parallel and concurrent gc options.  How do these options help or not help?

  11. This isn't a problem[ Go to top ]

    With a 1-2GB heap, and many instances how is this an issue?  So your server has 32 GB (which is a generous bit of memory) so you need to run 16-24 JVMs.  The ratio of JVM to useable memory is still perfectly acceptable particularly given the thesis of the article which is that servers are getting larger and more capable all the time.

    As far as manging all those servers, app servers like WebSphere Application Server gives you a nice console where you can maintain and monitor all those servers.  Perhaps some app servers don't have such facilities out of the box, but this is not an architectural deficency simply a missing tool.

    I have worked on big consumer sites that use JVMs on big servers with 32+ GB (a total of 100-200GB of ram)  and this has never been an issue in the smallest way.