Discussions

News: New Article: Scaling Your Java EE Applications - Part 2

  1. Java applications can be scaled vertically (on a single system), or horizontally (across multiple systems). But to do either, you have to understand all parts of the system and software. Not doing so could defeat the purpose of adding system resources or more systems. Wang Yu presents some surprising results of Java application scalability based on his experiences in a performance laboratory. The second installment of this series discusses scaling horizontally. Read Article

    Threaded Messages (26)

  2. Hi, somehow the link to the part2 article doesn't allow one to click on Firefox (linux). I have to use view source to find the url. also at the end of article, "print friendly version", will bring up the part 1 instead of part 2. Chester
  3. Hi, Try this: http://www.theserverside.com/tt/articles/content/ScalingYourJavaEEApplicationsPart2/article.html X
  4. Speaking of Hadoop and MapReduce, it'll be interesting to see how analytic databases adopt the MapReduce construct. Check out Aster Data - the first fully integrated MPP relational database to seamlessly integrate MapReduce: http://www.asterdata.com/product/mapreduce.html
  5. Where us the part #1
  6. #Part 1 Try this[ Go to top ]

    http://www.theserverside.com/tt/articles/content/ScalingYourJavaEEApplications/article.html
  7. It links to the part 1 of the article.
  8. JBossCache[ Go to top ]

    JBossCache with Buddy replication enabled is quite powerful as well and should scale very nicely. See http://jbossworld.com/downloads/pdf/wednesday/JBOSS_1-150pm_ClusterTuning_Bela_Ban.pdf
  9. The approach about the data partition, described in the last section, is similar to the Oracle Table partition. Oracle table partition allows one partition data based on Hash or data range. The partition associated with different tablespace which can be on different volumes. This allows the data partitions to distributed on different disks. If one partition goes down, it may or may not affect others (depending on how indexes are created). Another issue one has to consider with this type of data partition is indexing. If you have to enforce data uniqueness, then you have to have "globe index" instead of "local index". then index will depends on all partitions.
  10. The approach about the data partition, described in the last section, is similar to the Oracle Table partition. Oracle table partition allows one partition data based on Hash or data range. The partition associated with different tablespace which can be on different volumes. This allows the data partitions to distributed on different disks. If one partition goes down, it may or may not affect others (depending on how indexes are created).

    Another issue one has to consider with this type of data partition is indexing. If you have to enforce data uniqueness, then you have to have "globe index" instead of "local index". then index will depends on all partitions.
    Hi, Chen This technology is called "sharding". It is famouse and popular. Youtube (google) and other famouse social network websites may use this technology. It is different from Oracle partitions. Oracle partitions nomally run within a single Database instance. And to be bigger, Oracle need more expensive Storage. Oracle partitions are used to split disk IO bandwidth, we call this a "scale up" technology. "Sharding" is total distributed structure and can be "scaled out" with more cheap machines. Best Regards Wang Yu
  11. This technology is called "sharding". It is famouse and popular. Youtube (google) and other famouse social network websites may use this technology.

    It is different from Oracle partitions. [..] "Sharding" is total distributed structure and can be "scaled out" with more cheap machines.
    Please note taht the Oracle Coherence Data Grid (which is Java middleware) supports dynamic partitioning of sets of objects across commodity hardware. It has all of the benefits of sharding, but also provides data integrity and continuous availability of information, even when there is server failure. It is used by many of those "famous and popular" large-scale sites, such as Orbitz and FedEx.com. Peace, Cameron Purdy Oracle Coherence: Data Grid for Java, .NET and C++
  12. The approach about the data partition, described in the last section, is similar to the Oracle Table partition. Oracle table partition allows one partition data based on Hash or data range. The partition associated with different tablespace which can be on different volumes. This allows the data partitions to distributed on different disks. If one partition goes down, it may or may not affect others (depending on how indexes are created).

    Another issue one has to consider with this type of data partition is indexing. If you have to enforce data uniqueness, then you have to have "globe index" instead of "local index". then index will depends on all partitions.
    In "sharding", data uniqueness may be enforced partly in the client side (wrapped in API), so, "globe index" is not necessary. Wang Yu
  13. The approach about the data partition, described in the last section, is similar to the Oracle Table partition. Oracle table partition allows one partition data based on Hash or data range. The partition associated with different tablespace which can be on different volumes. This allows the data partitions to distributed on different disks. If one partition goes down, it may or may not affect others (depending on how indexes are created).

    Another issue one has to consider with this type of data partition is indexing. If you have to enforce data uniqueness, then you have to have "globe index" instead of "local index". then index will depends on all partitions.


    In "sharding", data uniqueness may be enforced partly in the client side (wrapped in API), so, "globe index" is not necessary.

    Wang Yu
    If the data uniqueness is enforced by client side API, then the data integrity (regarding uniqueness) is not enforced by database. I am not sure about the"'global index' is not necessary" conclusion. This is almost saying since client side is enforcing data integrity, therefore database integrity checking is not necessary. Where can I find more about the "Sharding" ? Chester
  14. these places have more information. http://www.hibernate.org/414.html http://research.google.com/video.html You might want to look at coherence also and play with it. Many data grids use the same concepts and ideas to partition and distribute data across a cluster. peter
  15. What about the performance of gridgain? Gridgain looks to be a lot more advanced solution for MapReduce then Hadoop...
  16. Pretty useless...[ Go to top ]

    The article that doesn't mention either Coherence, or GridGain, or GigaSpaces among other scalability-related projects is pretty shallow and/or clearly biased... Sad. Nikita Ivanov. GridGain - Grid Computing Made Simple
  17. Re: Pretty useless...[ Go to top ]

    The article that doesn't mention either Coherence, or GridGain, or GigaSpaces among other scalability-related projects is pretty shallow and/or clearly biased...

    Sad.
    Nikita Ivanov.
    GridGain - Grid Computing Made Simple
    Glad to hear some one else thinks the article is shallow. TSS needs to raise the bar on articles. peter
  18. No details[ Go to top ]

    Peter, I agree this article is frustrating. I looked for details on what the author did with JBossCache--nothing. I looked for what didn't work with Terracotta--nothing. And what the 9 memcache apps were. Just lots seems to be lots of love for database-centric architectures. That's kewl. Memcached and sharding have their place. But Terracotta is all about avoiding the database and I wouldn't expect someone to be good at both Memcache and Terracotta-based apps. I will just have to reach out to the author to learn what he tried and found. Definitely our customers running up to 150 node applications will be shocked to learn we don't work at that scale :) There is even a user on our forums asking us to comment and correct "questionable comments and conclusions about Terracotta," in the article. I wish I could but until I speak to the author, I reserve further comment. I guess Wang Yu should consider this an open invitation to sit down and discuss. Cheers, --Ari
  19. Re: No details[ Go to top ]

    Peter,

    I agree this article is frustrating. I looked for details on what the author did with JBossCache--nothing. I looked for what didn't work with Terracotta--nothing. And what the 9 memcache apps were.

    Just lots seems to be lots of love for database-centric architectures.

    That's kewl. Memcached and sharding have their place. But Terracotta is all about avoiding the database and I wouldn't expect someone to be good at both Memcache and Terracotta-based apps.

    I will just have to reach out to the author to learn what he tried and found. Definitely our customers running up to 150 node applications will be shocked to learn we don't work at that scale :) There is even a user on our forums asking us to comment and correct "questionable comments and conclusions about Terracotta," in the article.

    I wish I could but until I speak to the author, I reserve further comment.

    I guess Wang Yu should consider this an open invitation to sit down and discuss.

    Cheers,

    --Ari
    Hi, Ari In this article, I only showed the results of projects tested in our performance lab. This lab is open and free to let our partners and other ISVs to test their solutions in all kinds of our server machines. The results are not officely announced benchmarks. so, as I mentioned in the article, "The test result is only reflected on the projects in our laboratory; your results may vary." Yes, we have a project tested on JBossCache, but I said nothing about it. Because the result for this project was bad. It is unfair to say that JBossCache is not good. The result may be caused by limitation of developers's knowledge, misconfiguration of the products, or the problems of the architecture of this projects. I mentioned Terracotta because the customer was satisfied with it. All the projects (tested in our lab) with Memcached and Terracotta used them as distributed cache for database. None of the projects are tested on both of them. As Terracotta is becoming popular in China (after your trip to Beijing), can I invite your technical staffs to participant in the coming projects if Terracotta is used. Best Regards Wang Yu
  20. Re: No details[ Go to top ]

    I mentioned Terracotta because the customer was satisfied with it.

    All the projects (tested in our lab) with Memcached and Terracotta used them as distributed cache for database. None of the projects are tested on both of them.

    As Terracotta is becoming popular in China (after your trip to Beijing), can I invite your technical staffs to participant in the coming projects if Terracotta is used.

    Best Regards
    Wang Yu
    Wang Yu, I understand now. I thought the projects were made up or somehow "concepted" by your team in order to test scale. Thanks for the clarification. Nice to hear that the Beijing trip was helpful for folks. I would be happy to participate in upcoming work. Cheers, --Ari
  21. Re: No details[ Go to top ]

    Yes, we have a project tested on JBossCache, but I said nothing about it. Because the result for this project was bad. It is unfair to say that JBossCache is not good. The result may be caused by limitation of developers's knowledge, misconfiguration of the products, or the problems of the architecture of this projects.
    And again, we'd all like to know more details about the tests, of configuration and setup, even version tested (JBoss Cache scalability has been improving by leaps and bounds, with each version release). Setup is important too, see the link someone posted earlier in this thread to scalability with JBC when using buddy replication. Remember, without details of the test performed (data access patterns, etc., cluster sizes, network types) and product tested (version, configuration) it is very hard for anyone to gain much meaning from such a report since no one has any idea whether any this relates to their use case, and all the article would serve is to mislead. - Manik
  22. Re: No details[ Go to top ]

    I will have hard time believing any article that talks about benchmark, performance comparisons etc without providing the exact test configurations, deployables, network bandwidth etc. I have lately worked on performance and scalability tests quite a bit and have seen that even a minor configuration change results into different metrics.
  23. Re: No details[ Go to top ]

    Hi, prabhat You are absolutely right! Even “providing the exact test configurations, deployables, network bandwidth” is not enough. To be a benchmark, a standard application is needed. So I will iterate again that this is not a benchmark to compare the performance of different products. This is an introduction article for beginners, to show them some solutions for scalability based on the projects tested in our local Lab. The focus is on “scalability” instead of “performance”. For example, as I said in the article, if you want to make your Java applications more scalable, you may use distribute cache, and following open source products (balabala) are tested in our lab, and some of them got very good results. I never compared performance between two, for none of projects were tested on multiple platform. Best Regards Wang Yu
  24. Re: No details[ Go to top ]

    [..] I looked for what didn't work with Terracotta--nothing. And what the 9 memcache apps were.

    Just lots seems to be lots of love for database-centric architectures.

    That's kewl. Memcached and sharding have their place. But Terracotta is all about avoiding the database and I wouldn't expect someone to be good at both Memcache and Terracotta-based apps.
    Ari - FWIW comparing Terracotta and Memcached is like comparing apples and oranges. Memcached is going to scale id-based read/write operations as linearly as the problem is partitionable, but Memcached doesn't provide the object graph or synchronization capabilities that Terracotta has. If the problem is sharing a mutable object graph, Terracotta (or JBoss AOP POJO Tree Cache or whatever the name is) are the applicable Java tools for the job. If the problem is identity-based (e.g. key-based caching), then Memcached is appropriate (or upgrade to Coherence ;-). Peace, Cameron Purdy Oracle Coherence: Data Grid for Java and .NET
  25. Compliments, interesting and useful article!
  26. Good article illustrating some less expensive solutions. Thank you Yu Wang. -wk
  27. First of all i'd had to agree with Nikitas comment that an article is missing references to other commonly used products in this area. Its interesting to see that all the discussion around scalability are really centered around database scaling. Indeed there are various methods to scale the database bottleneck either through database clustering or through a combination of In-Memory-Data-Grid (IMDG). Memcache can solve the read scaling (only for key based queries) but doesn't solve well read/write scaling and it doesn't really enables us to decouple the database from our application code and therefore i would consider that as a relatively low end solution which may improve our application read scaling in certain scenarios. It is also not clear what were the consistency and high availability constrains applied in this test. As we all know those two factors can have significant impact on scaling and performance. At this point its important to note that memcache doesn't provide any transaction/highavliability support. I also found the mentioning of parallel processing and the use of MPI and MapReduce quite irrelevant to the topic i.e. JEE application have better and more native choices such as GridGain/GigaSpaces and to a degree Terracotta (i'm sure i missed few names here). I wouldn't recommend JEE users to use either Hadoop or MPI before trying any of those native implementations. Hadoop is specifically geared for parallel agregation of distributed files (such as in the case of search engine), it becomes fairly limited when you try to position it as a general purpose solution for parallel processing. I listed below few references that provide additional prespective and methods for scaling JEE applications: 1. I wrote a blog compering database sharding/clustering with memory based clustering and suggested a model on when to choose or use each of them here 2. The following whitepaper discuss what end to end scaling means. The article is named: The Scalability Revolution - From Dead End to Open Road 3. We (GigaSpaces) have done a detailed research which lasted few months that compares the various type of scaling methods and their impact on the application end to end scaling. We used a typical transactional J2EE application based on JMS as feed and SessionBean for managing the business logic and obviously database for maintaining the state and high availability. We seperated the test into four parts in which we measured the impact of the improvements we made on the end to end application scaling. The step were: 1. Adding Hibernate as 2nd level cache 2. Using IMDG as the system of records and keeping the database asynchronously updated. 3. Partitioning the messaging layer (The message queue). 4. Scaling the entire application using Space Based Architecture where we basically partitioned the entire application not just the data and collocated the relevant elements based on their runtime dependency. You can see a preliminary results of that test here. Note: 1. The entire tests used the exact same code. In the test we plugged in different (more scalable) middleware layers keeping the code decoupled from those changes. The only thing that changed in few of the steps is the DAO implementation. 2. The full report+the code can be available for those interested. The full details will be published soon. In the mean time you feel free to contact me directly if you have further questions on this matter. As you can see from those results end-end scaling can be very different then measuring just the data-layer scaling. Nati S. natishalom.typepad.com GigaSpaces