Discussions

Performance and scalability: How to avoid 2GB memory limit of JVM in Linux.

  1. Our servers are 32-bit intel boxes running SUN JDK 1.4.2_04 on RedHat Linux AS. Maximum memory we are able to use is about 1.6GB. This is a big issue because cache is uses extensively in our architecture. The servers theirselves have 4GB memory.

    IBM 1.4.1-compatible JVM is able to use about that much, too.

    Jrockit 1.4.1-compatible JVM utilizes about 1.8GB and if I indicate heap size larger than that - gives the following message on the startup:

    > java -Xms1000m -Xmx2500m Main
    Unable to aquire some virtual address space - reduced from 2560000 to 1996800 KB!

    The test program crashes with JVM Out of Memory as soon as it asks for more than that memory.

    From what I understand the reason behind the limitation is the following: the maximum size of a contignous memory block you can allocate on Linux/x86 is ~2GB and JVM Heap is implemented as contignous memory block, indeed.

    Anybody has more information about it?

    Interestingly enough - this does not occur on Sun Solaris running on Sparc servers. I did a test - no problem at all.

    I don't want to move our servers to Sun Solaris on Sparc processors, though. And it's not just my personal choice, too, of course :) I am not sure my organization looks forward for a radical change like that, including extra expenses.

    I am wondering if there is a similiar limit on 64-bit intel-compatible architecture running 64-bit Linux? That would be nice because staying on Linux but changing servers to Opteron or smth is less pain and easier to sell to the management.

    I think it is quite a shame that in the current world you can not use JVM memory more than 2GB :(
  2. Hi,
    The 32 bit machines have a max memory size of 2GB, while 64 bit machines (Sparc, Obteron, Intel64, etc) have a much more larger mx memory size.
    Best regards, Mircea
  3. Hi Irakli,

    You really want to stay away from heap sizes even approaching that size due to the garbage collection issues that result. With Coherence you can cache your data in a Coherence Distributed Cache which actually partitions the entire cache amongst the participating cluster nodes. So in your scenario you can run multiple stand alone JVMs (running as CacheServers) to physically manage the cached items perhaps with -Xmx and -Xms set to 256m. The data that you want to cache will still be in-memory at the application tier but will not effect the heap size of your application server JVM.

    Later,
    Rob Misek
    Tangosol, Inc.
    Coherence: It just works.
  4. Mircea,

    2GB memory limit is not due to hardware limitation. The hardware limit for 32bit architecture is more than that (at least 4GB). Our Linux machines have 4GB memory and Linux has no problem utilizing all of it. It's JVM that can not use more than 2GB, hence looses at least half of the memory. The Sparc machine, that I mentioned in the previous message as not having such problem, is an 32-bit CPU server, by the way not 64 bit. And as a matter of fact - even when it ran out of physical memory it did not crash - it began using linux swap. Performance was degraded, of course but no "JVM out of memory", at least.

    Rob,
    I have heard a lot of good words about Coherence. I have, actually, read your architecture document, a year or so ago. Frankly, at that time it seemed a little bit odd - why had you choosen that path as the solution has slightly higher network traffic than "classical" clustered cache implementations. Now I understand. Very cool :)

    Unfortunately, I am afraid, I can not use Coherence. Our project (www.digijava.org) is intended to be open-source, hence we can use only open-source components. I don't think there's a similiar implementation in the open-source domain.

    That said, some of the folks that use Digijava may want to enhance their production deployment by using Coherence, I am sure. I know that Coherense was one of the first caches supported as Hibernate plugin. Can you, please, provide the URLs of systems using Coherence under Hibernate in the production, at high-load traffic? That would be interesting.

    That aside, I am still curious if somebody has tried 64-bit JVM and if it has the same limitation due to contignous memory block size constraint. Maybe JVM vendors should find a way to make heap work with several memory blocks and remove the limitation. I wonder if there is any work being done on that front. On google I found that this limitation problem is very common and occurs on both Linux and Windows.
  5. Hi Irakli,
    Rob,I have heard a lot of good words about Coherence. I have, actually, read your architecture document, a year or so ago. Frankly, at that time it seemed a little bit odd - why had you choosen that path as the solution has slightly higher network traffic than "classical" clustered cache implementations. Now I understand. Very cool :)
    Thanks. We actually try to keep the network traffic to a minimum by dynamically switching between UDP multicast and unicast (depending on calculated efficiency) at the packet level. In the Partitioned Cache that I mentioned above 99.9% of traffic is unicast since we "know" where the piece of data that you are requesting is in the cluster. This allows for near-linear scale as new nodes are added to the cluster.
    Unfortunately, I am afraid, I can not use Coherence. Our project (www.digijava.org) is intended to be open-source, hence we can use only open-source components. I don't think there's a similiar implementation in the open-source domain.
    AFAIK none of the open source implementations out there have a concept of partitioned caching.
    That said, some of the folks that use Digijava may want to enhance their production deployment by using Coherence, I am sure. I know that Coherense was one of the first caches supported as Hibernate plugin. Can you, please, provide the URLs of systems using Coherence under Hibernate in the production, at high-load traffic? That would be interesting.
    We have a number of customers taking advantage of the existing Coherence/Hibernate plugin with great success. That coupled with the use of our Partitioned/Read-Through/Write-Behind caching technology has proven to be a very powerful combination. Drop me a line at rmisek at tangosol dot com.

    Later,
    Rob Misek
    Tangosol, Inc.
    Coherence: It just works.
  6. Just a correction: Except for very older machines, all Sparc machines have a 64 bit-wide address bus.
  7. The 2GB memory limit is imposed by the kernel-- 2.4 kernels on 32-bit systems used a 2G/2G virtual memory boundry between kernel memory and process memory. I believe that Windows uses a similar split, but I may be mistaken.

    Therefore, no single process can allocate more than 2G of RAM on these machines.

    There have been modifications available as kernel patches if you're brave, which change the mapping to 3G/1G, which allows up to 3G of RAM to be used by a single process.

    Alternatively, in newer kernels it is possible to use a 4G/4G mapping, which allows up to 4G RAM to be allocated to a single process. The only distribution I know of that enables this mapping is Fedora Core 2, although it is a planned feature for RedHat Enterprise Linux 4.0 as well.

    Note that using the 4G/4G mapping requires using an extension Intel put into recent (since Pentium II, I think) processors called PAE, specifically to work around the 32-bit address space limitations. Using this extension will reduce your overall memory performance somewhat-- I've seen numbers of around 10% thrown around, but your mileage may vary.
  8. The 2GB memory limit for a single process is imposed by the kernel 2.4 on 32-bit systems, but it has been patched by the kernels 2.6 versions.

    I installed the Fedora Core 2 on my 32-bit machine x86(with 3GB RAM), and I compiled a simple C program with gcc that tests how much memory it's able to use, and the maximun size was 3G with a bit of swapping.

    Taking this into account, I tried to run the jsdk1.4.2_05 with this parameters:
    -Xms2000m -Xmx2000m and the exception was: "Could not reserve enough space for object heap". Unless the maximun heap is less than 1900m, the same exception occurrs. (1800m for others JVM versions)

    So, the really problem with 2G memeory limit is de JVM on 32-bit systems.

    The parameters like -XX+AggressiveHeap extends the heap up to 2.5G but the JVM shuts down when is loading up. I suspect that is because of my one processor machine.

    I guess that need a special JVM that support more heap size.

    I found IBM JDK, but it just gets only 100MB more.

    well, if somebody knows how to really AVOID t2GB memory limit please tell me.
  9. I performed memory test on an Athlon AMD64 laptop with
    Mandrake 10 AMD64 and Sun JDK 1.5RC for AMD64.
     
    test program started with 3.2GB memory limit setting
    and the test program was able to consume 2.6G memory.
     
    The physical RAM on the laptop is about 900M so most
    of the test was using SWAP space and it became too
    slow after 2.6G.

    I should be able to have access to an Opteron with large-enough RAM, soon, so I am going to repeat the same test and post the results, here.

    In any case, the following is the simple test program I was using:

    public class MemLoadTester {
            public static void main ( String[] args ) {

                long rand;
                char cr;
                char[] cra = new char[2048];

                for ( int i=0; i<2048; i++) {
                    rand = 70 + Math.round( Math.random() * 50);
                    cr = (char) rand;
                    cra[i] = cr;
                }

                int entities = 800000;
                int blocksize = 10240;

                String[] zz = new String[ entities];

                // Java char type is 2bytes.
                System.out.println (" Each box indicates the creation of ~40Meg");

                for ( int j=0; j<entities; j++ ) {
                    zz[j] = new String( cra );
                    if (j != 0 && j % blocksize == 0 ) {
                        System.out.print ( "#" );
                    }

                    if (j != 0 && j % (blocksize * 10) == 0 ) {
                        System.out.println ( "" );

                    }

                }

                System.out.println( "\n");

            }
        System.out.println ( " Success ");
    }
  10. I intensively look around but it seems that there are no solutions to allocate 3-4Gb in the java heap for 32bit servers.
    Any successful experiences?
  11. The memory limit on 32-bit JVMs is due to the constrained max size of a contignous memory block that can be allocated.

    What you can do is - if you really need 32-bit platform (why?) you can use Sun Solaris on a Sparc which, even being 32-bit, does not have the memory limitation.

    You can, also, manipulate the memory directly with something like:
    http://jguru.com/faq/view.jsp?EID=464671

    You would have to write a layer through which your cache (that's where most of the memory is utilized, I assume?) would go, so it uses alternate memory-manager, JNI-based and supposedly not constrained.

    But then - would not be it much harder than just getting 64-bit processor servers?

    By the way, I did run the same test (mentioned in my last post, anove) on a 64-bit server (AMD64) with enough RAM and the results were beautiful - very fast, no limit.

    Go for 64bit!
  12. I did a simple test of allocating a String array of 3M String entries of 6 chars each, checking the size of the JVM, then creating a HashSet with the same contents of the String array. My rough math suggested about 163M heap (including HashSet). The results:

    AMD64 - 388MB before HashSet / 603MB after
    ia32 - 171MB before HashSet / 258MB after

    It is obvious to me that 64-bit JVMs are a HORRIBLE idea unless you plan you plan to address MUCH MORE than 4GB of data. For any less than 4GB of RAM, 64-bit addressing is a huge waste of resources and a 32-bit JVM will provide MUCH greater capability.

    Run this and see for yourself!

        public static void main( String[] args )
        throws Exception {

            try {

                start = System.currentTimeMillis();
                int size = 3000000;
                String[] cache = new String[size];
                for ( int i=0; i< size; i++ ) {
                    int rand = (int)
                        (300000000 * Math.random() );
                    cache[i] = Integer.toString( rand, 36 );
                }

                time( "Created cache" );
                System.gc();

                // reset clock

                String[] terms = new String[]
                  { "AB", "CD", "CA", "B5",
                    "33", "ADC", "ADC2" };

                for ( int x=0; x<terms.length; x++ ) {
                    String term = terms[x].toLowerCase();
                    int matches = 0;
                    start = System.currentTimeMillis();
                    for ( int i=0; i< size; i++ ) {
                        if ( cache[i].indexOf( term ) >= 0 ) {
                            matches++;
                        }

                    }
                    time( "Found " + matches +
                         " matches for " + term );
                    Thread.sleep( 100 );
                }

                time( "Paused" );
                Thread.sleep( 20000 );

                // create hash set
                start = System.currentTimeMillis();
                HashSet hs = new HashSet( size );
                time( "Created hash set" );

                for ( int i=0; i<size; i++ ) {
                    hs.add( cache[i] );
                }

                for ( int x=0; x<terms.length; x++ ) {
                    String term = cache[2000*x].toString();
                    start = System.currentTimeMillis();
                    hs.contains( term );
                    time( "Found term " + term );
                    Thread.sleep( 1000 );
                }

                Thread.sleep( 20000 );

            } catch( Exception e ) {
                e.printStackTrace();
            }

        }

        public static void log( String s ) {
            System.out.println( s );
        }

        public static void time( String s ) {
            long duration = System.currentTimeMillis() - start;
            log( s + " in " + duration + "ms" );
        }
  13. test on itanium[ Go to top ]

    I am using the sun jvm with the -Xms and -Xmx options, and the processor has enough RAM (8GB) and runs Debian linux, though i dont get errors like: 'cannot allocate enough heap' or the 'outOfMemory' errors, my program seems to freeze after a while. My program is very memory intensive and I would definitely expect it to do quite a bit of paging. My question is:
    a. with the AMD's did you set an upper limit for your heap size
    b. if the program needs more than the upper size does it use virtual memory, using paging mechanisms?
    c. the upper limit i set is much higher than the physical RAM, though the system doesnt complain about this, is this a right thing to do. should i just set the upper limit to around 80% of physical RAM (recommended by SUN) and expect the system to trigger paging when it runs out of it.

    thanks
  14. We're on a Sun V40z with two AMD64s, running RHE3 with the 64-bit upgrades, and we're still unable to break through a 2.5Gb limit. We are able to run over 3Gb on Sun Solaris, and many others have posted.

    If you've been able to break through on this type of architecture, please let me what implementation details I'm missing.
  15. 2 different boxes,[ Go to top ]

    box 1, can set more than 2GB Xmx:
    > uname -a
    Linux <hostname> 2.4.21-15.11.1.ELsmp #1 SMP <time> i686 i686 i386 GNU/Linux
    > cat /proc/cpuinfo
    processor : 1
    vendor_id : GenuineIntel
    cpu family : 6
    model : 11
    model name : Intel(R) Pentium(R) III CPU family 1266MHz
    stepping : 1
    cpu MHz : 1258.306
    cache size : 512 KB
    physical id : 0
    siblings : 1
    runqueue : 1
    fdiv_bug : no
    hlt_bug : no
    f00f_bug : no
    coma_bug : no
    fpu : yes
    fpu_exception : yes
    cpuid level : 2
    wp : yes
    flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse
    bogomips : 2510.02
    >free
                 total used free shared buffers cached
    Mem: 4063348 2355820 1707528 0 183376 1683244
    -/+ buffers/cache: 489200 3574148
    Swap: 8240448 448080 7792368
    > java -version
    java version "1.4.2_04"
    Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_04-b05)
    Java HotSpot(TM) Client VM (build 1.4.2_04-b05, mixed mode)

    ============================================================
    box 2, can't set more than 2GB Xmx:
    >uname -a
    Linux <host> 2.4.9-e.57enterprise #1 SMP <time> i686 unknown
    >cat /proc/cpuinfo
    processor : 3
    vendor_id : GenuineIntel
    cpu family : 15
    model : 2
    model name : Intel(R) Xeon(TM) CPU 3.06GHz
    stepping : 9
    cpu MHz : 3056.864
    cache size : 512 KB
    physical id : 0
    siblings : 2
    fdiv_bug : no
    hlt_bug : no
    f00f_bug : no
    coma_bug : no
    fpu : yes
    fpu_exception : yes
    cpuid level : 2
    wp : yes
    flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2
    ss ht tm
    bogomips : 6107.95
    >free -t
                 total used free shared buffers cached
    Mem: 6178776 4667384 1511392 0 154972 4238652
    -/+ buffers/cache: 273760 5905016
    Swap: 6192456 30104 6162352
    Total: 12371232 4697488 7673744
    >java -version
    java version "1.4.2_07"
    Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_07-b05)
    Java HotSpot(TM) Client VM (build 1.4.2_07-b05, mixed mode)
  16. I see that no one has posted since 2005 I was wondering if anything as figured out how to get around this problem.

    one way that I did it to let me get up to about 3.5 GB on the JVM was to have a deadicated user to run my java under and increase the user memory allication to about 0xfffffffffff or something like to that whick is the max a 32 bit sys can use. but it's still not enough for what I'm trying to do.

    so if anything has new suggestion or some resource to help get though this that would be awesome.

    I have install RHEL 4 hugemem kernel and I have about 8 gigs of mem and plenty of swap. with 32 bit xeion 2.8. I think it's 32 bit. but it was a 64 bit I probably wouldn't be runing into this error.

    So please if someone as figured out a way to get around these limits in linux let us know.

    thanks,

    Nic
  17. http://kerneltrap.org/node/2450
  18. For distributable web applications i recommend application server level virtualization with load balancing. Glassfish v2 with cluster configuration and sun webserver 6.1 with proper LBplugin configuration will provide effective cpu and memory utilizations. (We prefer sparc boxes with solaris OS on them) i hope this suggestion helps to answer this question years later :)