Discussions

News: New Article: Scaling Your Java EE Applications

  1. Java applications can be scaled vertically (on a single system), or horizontally (across multiple systems). But to do either, you have to understand all parts of the system and software. Not doing so could defeat the purpose of adding system resources or more systems. Wang Yu presents some surprising results of Java application scalability based on his experiences in a performance laboratory. The first installment of this article discusses scaling vertically. Read the Article

    Threaded Messages (40)

  2. broken link[ Go to top ]

    The link to the article points to the TSS homepage.
  3. Re: broken link[ Go to top ]

    Try here http://www.theserverside.com/tt/articles/article.tss?l=ScalingYourJavaEEApplications
  4. CAS operations[ Go to top ]

    Shouldn't you re-evaluate newValue in code list 5 after CAS operation failed?
  5. Yes, you are right[ Go to top ]

    The sample code is incorrect. In order to get described behavior - "save increment", it should be changed to: while (value.compareAndSwap(oldValue, newValue) != oldValue) { oldValue = value.getValue(); newValue = oldValue + 1; } In fact, existing code from the article, will succeed to set the value, that is calculated on the first iterration and if other thread succeed to change it a meanwhile, the change will be lost.
  6. Hi, Ivan and Stoyan Thank you so much to point out the error. I will modify it and ask the editor to post the correct code. Wang Yu
  7. Very nice article[ Go to top ]

    The concept of lock-free data structure in java.util.concurrent package seems identical to the optimistic locking in ORM.
  8. when is the second part coming?[ Go to top ]

    pretty good article, look forward to seeing the second part
  9. The expectations for 'realtime java' aside, I have been surprised that NIO has not unmap(). How can I go? Java is REALLY ready for mission-critical applications? Very doubtful, isn't it?
  10. I don't see anything new[ Go to top ]

    The stuff in the article seems to be a rehash of existing knowledge. Is there anything new, or is it just rehash? peter
  11. Re: I don't see anything new[ Go to top ]

    This really a very useful article , so I wrote a blog. http://blogs.deepal.org/2008/07/scaling-java-ee-applications.html
  12. its more like plagiarism[ Go to top ]

    It looks like reprint from http://www.ibm.com/developerworks/java/library/j-jtp11234/ I think at least original author must be quoted
  13. Re: its more like plagiarism[ Go to top ]

    Thanks for pointing that out, Alex. I've just read the DeveloperWorks article, and while the content is similar to the middle part of this one, I didn't see direct plagiarism. I've queried the author to see if he was aware of this article, and I think a reference to this article in the text is appropriate.
  14. Even if it's not plagerism[ Go to top ]

    The content is basically rehash of existing stuff found on the internet. Maybe it's new to the author, who apparently doesn't realize it's old stuff. peter
  15. Old content may also be useful[ Go to top ]

    Hi, Peter Thank you for point out that the content in this article is not related to new technologies. But it is not rehashed from internet. I just summarized the experience from the projects tested in our performance lab, and all the case studies are real world projects. I just hope to do a little help on the coming Java projects related to scalability, to avoid some bad practices from the failed projects. Thank you and Best Regards Wang Yu
  16. Re: Even if it's not plagerism[ Go to top ]

    The content is basically rehash of existing stuff found on the internet. Maybe it's new to the author, who apparently doesn't realize it's old stuff.

    peter
    So? There are new and young generations of people joining IT industry who would like to read something like this. The point of your post is? Original poster was trying to help community, not like he was trying to put shameless plug and sell something at the same time.
  17. Re: Even if it's not plagerism[ Go to top ]

    The content is basically rehash of existing stuff found on the internet. Maybe it's new to the author, who apparently doesn't realize it's old stuff.

    peter


    So? There are new and young generations of people joining IT industry who would like to read something like this.

    The point of your post is?

    Original poster was trying to help community, not like he was trying to put shameless plug and sell something at the same time.
    You posted same thing twice, seems you really want to be flamed around nja.
  18. That's funny[ Go to top ]

    I expect to be flamed for pointing out the obvious. For example, other like Cliff click have covered lock free in much greater detail and depth. Cameron has covered scaling with data grids, along with a few others. There's dozens of articles on proper usage of synchronize. Ari has covered terracotta in great detail and some of the talks are on google video. My bias take is the article shows a lack of awareness of prior literature and doesn't attempt to take it further. If a college student were to write a paper and not cite prior literature, would it be considered a good paper? Maybe it's just me, but I much rather read in depth articles that have more meat and content. I wear abestos underwear, so feel free to flame on. peter
  19. Re: That's funny[ Go to top ]

    I expect to be flamed for pointing out the obvious. For example, other like Cliff click have covered lock free in much greater detail and depth. Cameron has covered scaling with data grids, along with a few others. There's dozens of articles on proper usage of synchronize. Ari has covered terracotta in great detail and some of the talks are on google video.

    My bias take is the article shows a lack of awareness of prior literature and doesn't attempt to take it further. If a college student were to write a paper and not cite prior literature, would it be considered a good paper? Maybe it's just me, but I much rather read in depth articles that have more meat and content.

    I wear abestos underwear, so feel free to flame on.

    peter
    In the context that there are new and young people coming to IT, such as recent graduates and similar, who might have not read articles you have mentioned, you opt for lame flame 'pointing the obvious' instead of 'nice guy' saying: "Here you go guys, additional links you could find useful in this matter, directly from my bookmarks, link1, link2". Not all articles should be hard core, PhD like, some are introductory, some are not. If you dont like article, go read http://citeseerx.ist.psu.edu/ or something.
  20. Hi, Alex Thank you to point it out. Now I attached all the reference resources (including the second part of the article). Thanks and Best Regards Wang Yu
  21. Vertical Scaling[ Go to top ]

    Vertical scaling used to be separating the functionality into slices vertically when looking at the tiers. That is, separating the database/persistence into one vertical layer, separating the web server into another vertical layer and putting the application server into a third vertical layer. Each of these vertical layers can then be scaled horizontally (by adding more machines). It seems the definition changed sometime when I wasn't looking.
  22. pretty good article, looking forward to seeing the second part
  23. 2nd part - I doubt it[ Go to top ]

    Given that the first article was basically just a copy from an article from Brian Goertz written in 2004 I have my doubts that we will see a 2nd part.
  24. Typo[ Go to top ]

    Typo... The name is Brian Goetz and I just noticed that someone already pointed it out incl. a link to the original article.
  25. Re: Typo[ Go to top ]

    Typo... The name is Brian Goetz and I just noticed that someone already pointed it out incl. a link to the original article.
    As publisher of this article (and most certainly unaware of the original piece), I think it would be in TSS's best interest to take this article off the site, just to be safe. The first half of this article is too similar to the original whose link was provided... Whoever Brian is, I'm sure he will not be pleased.
  26. Brian Goetz[ Go to top ]

    The first half of this article is too similar to the original whose link was provided... Whoever Brian is, I'm sure he will not be pleased.
    Brian Goetz is the author of "Java Concurrency in Practice" published by Addison Wesley in 2006 (besides being the author of a lot of other articles)
  27. Re: Brian Goetz[ Go to top ]

    The first half of this article is too similar to the original whose link was provided... Whoever Brian is, I'm sure he will not be pleased.

    Brian Goetz is the author of "Java Concurrency in Practice" published by Addison Wesley in 2006 (besides being the author of a lot of other articles)
    http://java.sun.com/developer/technicalArticles/Interviews/goetz_qa.html
  28. Brian who???[ Go to top ]

    Brian should be known to all java developers. If you don't then you need to be reading more. Please tell me you know who martin fowler is...
  29. The reference resources[ Go to top ]

    1 Scalability definition in wikipedia : http://en.wikipedia.org/wiki/Scalability 2 Java theory and practice: Going atomic: http://www.ibm.com/developerworks/java/library/j-jtp11234/ 3 Javadoc of atomic APIs: http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/atomic/package-summary.html 4 Alan Kaminsky. Parallel Java: A unified API for shared memory and cluster parallel programming in 100% Java: http://www.cs.rit.edu/~ark/20070326/pj.pdf 5 JOMP—an OpenMP-like interface for Java: http://portal.acm.org/citation.cfm?id=337466 6 Google MapReduce white paper: http://labs.google.com/papers/mapreduce-osdi04.pdf 7 Google Bigtable white paper: http://labs.google.com/papers/bigtable-osdi06.pdf 8 Hadoop MapReduce tutorial: http://hadoop.apache.org/core/docs/r0.17.0/mapred_tutorial.html 9 Memcached FAQ: http://www.socialtext.net/memcached/index.cgi?faq 10 Terracotta: http://www.terracotta.org/
  30. Thank you Wang for the article. I think it is valuable to continually educate new J2EE programmers on the subject of scaling. My particular question is in reference to verticle scaling. Can you give more details on why you would specifically recommend a max of a 3g heap, and which GC/JVM were you using for testing? I've found 4-5 gigs performed adaquetly without too long of pauses, but then again I realize pauses are laregely application dependant. 30 minute pauses seem unnececssarily long. Was the 12 gig Telco App you referenced using the Concurrent MarkSweep option? I would love to see more information on this subject, and if you have more detailed results from your testing. Thanks again!
  31. Thank you Wang for the article. I think it is valuable to continually educate new J2EE programmers on the subject of scaling.

    My particular question is in reference to verticle scaling. Can you give more details on why you would specifically recommend a max of a 3g heap, and which GC/JVM were you using for testing?

    I've found 4-5 gigs performed adaquetly without too long of pauses, but then again I realize pauses are laregely application dependant.

    30 minute pauses seem unnececssarily long. Was the 12 gig Telco App you referenced using the Concurrent MarkSweep option? I would love to see more information on this subject, and if you have more detailed results from your testing. Thanks again!
    Hi, Vincent Just as you have mentioned that GC pauses are laregely application dependant. If your heap has too many long lived objects (just like big cache), Even "ConcurrentMarkSweep" option will show low throughput when GC happens. But if most of your objects have short live time, increase heap size may help on the performance. Best Regards Wang Yu
  32. The worst solution is to put the "synchronized" keywords on the static methods, which means it will lock on all instances of this class
    Well, this is simply not true. When you put "synchronized" on a static method (let say the class is called HelloWorld) it means that it will lock on the HelloWorld.class object (and not on all instances of HelloWorld). Or am I missing something?
  33. The worst solution is to put the "synchronized" keywords on the static methods, which means it will lock on all instances of this class


    Well, this is simply not true. When you put "synchronized" on a static method (let say the class is called HelloWorld) it means that it will lock on the HelloWorld.class object (and not on all instances of HelloWorld).
    Or am I missing something?
    Hi, Fredi Thank you for pointing it out. You are right: If Class A has a "synchronized" static method "M1", and a "synchronized" non-static method "M2", A.M1() and a.M2() will not compete for the same lock resource. But if Class B has a non-static method M3 which calls A.M1(), then all instances of B will compete for the same lock resource when M3() is called. Thanks again! Wang Yu
  34. Filtering the garbage[ Go to top ]

    There's a lot of information out there about performance of which the majority is irrelevant for most applications. For instance, did the integration of stringbuffer really make a relevant difference? I found it particularly useful to have the insight of someone running scenarios and quantifying performance measures on a professional and daily basis in order for the rest of us to identify approaches which may actually make a real difference.
  35. Non blocking IO[ Go to top ]

    The Tomcat vs. Glassfish example used to examine the benefits of non-blocking IO is terrible. They gave Tomcat 1000 threads! They even admit this is a poor choice. You can handle a similar number of concurrent requests as with Grizzly (what Glassfish uses) by tuning the thread pool of Tomcat. Grizzly alleviates the need to tune, but there is a performance penalty.
  36. Re: Non blocking IO[ Go to top ]

    The Tomcat vs. Glassfish example used to examine the benefits of non-blocking IO is terrible. They gave Tomcat 1000 threads! They even admit this is a poor choice. You can handle a similar number of concurrent requests as with Grizzly (what Glassfish uses) by tuning the thread pool of Tomcat. Grizzly alleviates the need to tune, but there is a performance penalty.
    We have tried to give a small number of threads to Tomcat (100~200), but got a lot of "connection refused" or "time out" messages sometimes, which may mean no available threads for the new requests. While the CPU resources were not fully utilized (no more than 50%). Wang Yu
  37. Some comments[ Go to top ]

    First of all, thank you for a great article. Under the sentence "Single thread task problem" you have in the beginning of the text written that you ran your test on a SMP system with 8-CPUs Sparc. Later under the same sentence in the third paragraph you have written that you run your test on a 4-CPU Sparc. Write mistake? One other discussion about memory scaling. You discourage utilizing JVM with large heaps (in your example you used a 12G Java heap) due to garbage collection issues, which may in turn be a result of a non-optimized application. So, say that we have a application suffering from a non-optimized memory utilization and if we ignore the possibility to scale our system horizontally, e.g. use multiple system nodes, and we just have the option to scale the system vertically. Say, that our maximum workload will result in a usage of about 16G of Java heap + additional approx. 25% of non-heap and native memory, so we we end up with a process using 20G of memory (physical). Now we have two strategies for our production environment. 1. Either, we can setup one huge Java process using the -Xmx20g. 2. Or, we can setup 4 separate JVMs using 4G heap each. What is the best strategy? The main disadvantage with strategy 1 is the potential long delay Full GC´s. This problem is not easy to solve but it can probably be solved. For example we may test various garbage collection tuning options, for example let the "Full GC" run more frequently. This will make the total GC overhead to remain at the same level but the individual pauses will be shorter. A more certain way (and expensive way) to shorten the pauses is to utilize more CPUs/cores in our system. Now we can utilize the great advantage of the throughput collector and schedule the GC´s on multiple cores where the memory is splitted up in segments and each segment is GC´d by a separate GC thread which is scheduled on one of the CPU´s/cores. So, if we utilize 4 CPU´s and 32 cores, we will be able to utilize up to 32 GC threads in our application. If we buy fast CPU´s the GC pauses will be shorter. The disadvandage with strategy 2 is that we have more processes to maintain. One other disadvantage is if we not use shared memory between these JVM´s the overall memory usage will be higher because of non shared class data. The big advandage with this strategy is that the Full GC pause will be shorter and the probability that the Full GC conflicts with each other (i.e. run simultaneously) depends on how often and how long the Full GC runs. This is also an GC tuning issue. I would personally recommend the second strategy. This because several experience based reasons. 1. Redundancy. 2. Maintainability. The processes are managed separately, so you can restart one at a time if necessary. 3. Full GC pause time and probability factor (described above). 4. Troubleshooting. Have you ever tried to take a heapdump on a 4G JVM? Then you can imagine how it is to take a heapdump on a 20G JVM. It is more easier to troubleshoot a small process than a large process, in every case. 5. Hot lock´s will reduce the footprint. Any comments?
  38. Correction[ Go to top ]

    It should be: Either, we can setup one huge Java process using the -Xmx16g
  39. Re: Some comments[ Go to top ]

    Hi,Robert Thanks for your comments. I also prefer the second strategy. For the strategy 1, to let the "Full GC" run more frequently maybe a good solution. But this solution may not make an effective usage of the given memory. For example, to let the "Full GC" run when the old generation is half full, which means we will waste half space of the old generation. Thanks Best Regards Wang Yu
  40. Thanks, nice article[ Go to top ]

    Thanks much for the nice articles. Please keep up the good work, and ignore the trolls that disparage your efforts. Nobody knows everything (in spite of what they claim), and information sharing is always good. Best Regards, Rick
  41. Horizontal scaling recipe here...[ Go to top ]

    While waiting for the second part: http://guysblogspot.blogspot.com/2008/08/unlimited-scaling-easy.html Cheers