-
New Article: Scaling Your Java EE Applications - Part 2 (26 messages)
- Posted by: Nuno Teixeira
- Posted on: July 08 2008 12:11 EDT
Java applications can be scaled vertically (on a single system), or horizontally (across multiple systems). But to do either, you have to understand all parts of the system and software. Not doing so could defeat the purpose of adding system resources or more systems. Wang Yu presents some surprising results of Java application scalability based on his experiences in a performance laboratory. The second installment of this series discusses scaling horizontally. Read ArticleThreaded Messages (26)
- link to part2 doesn't work for firefox by Chester Chen on July 08 2008 13:49 EDT
- Re: link to part2 doesn't work for firefox by xllerena xllerena on July 08 2008 17:31 EDT
- Re: link to part2 doesn't work for firefox by Shawn Kung on August 26 2008 02:21 EDT
- Re: New Article: Scaling Your Java EE Applications - Part 2 by fabrice armisen on July 08 2008 14:22 EDT
- #Part 1 Try this by Alex Au on July 08 2008 20:49 EDT
- Printer-friendly Link points to a different article by xinyu liu on July 08 2008 14:38 EDT
- JBossCache by Tobias Frech on July 08 2008 15:26 EDT
- Re: New Article: Scaling Your Java EE Applications - Part 2 by Chester Chen on July 08 2008 15:42 EDT
- Re: New Article: Scaling Your Java EE Applications - Part 2 by Yu Wang on July 08 2008 21:37 EDT
- Re: New Article: Scaling Your Java EE Applications - Part 2 by Cameron Purdy on July 09 2008 04:14 EDT
- Re: New Article: Scaling Your Java EE Applications - Part 2 by Yu Wang on July 08 2008 22:17 EDT
-
Re: New Article: Scaling Your Java EE Applications - Part 2 by Chester Chen on July 09 2008 09:44 EDT
- Re: New Article: Scaling Your Java EE Applications - Part 2 by peter lin on July 09 2008 10:36 EDT
-
Re: New Article: Scaling Your Java EE Applications - Part 2 by Chester Chen on July 09 2008 09:44 EDT
- Re: New Article: Scaling Your Java EE Applications - Part 2 by Yu Wang on July 08 2008 21:37 EDT
- Re: New Article: Scaling Your Java EE Applications - Part 2 by Jive User on July 09 2008 11:59 EDT
- Pretty useless... by Nikita Ivanov on July 09 2008 16:55 EDT
- Re: Pretty useless... by peter lin on July 09 2008 18:41 EDT
-
No details by ARI ZILKA on July 09 2008 10:30 EDT
-
Re: No details by Yu Wang on July 10 2008 04:05 EDT
- Re: No details by ARI ZILKA on July 10 2008 06:28 EDT
- Re: No details by Manik Surtani on July 10 2008 06:49 EDT
-
Re: No details by prabhat jha on July 10 2008 06:06 EDT
- Re: No details by Yu Wang on July 10 2008 10:45 EDT
- Re: No details by Cameron Purdy on July 11 2008 01:07 EDT
-
Re: No details by Yu Wang on July 10 2008 04:05 EDT
-
No details by ARI ZILKA on July 09 2008 10:30 EDT
- Re: Pretty useless... by peter lin on July 09 2008 18:41 EDT
- Just compliments for the article by Simone Avogadro on July 11 2008 04:31 EDT
- Re: New Article: Scaling Your Java EE Applications - Part 2 by w k on July 18 2008 10:49 EDT
- Re: New Article: Scaling Your Java EE Applications - Part 2 by Nati Shalom on July 27 2008 02:51 EDT
-
link to part2 doesn't work for firefox[ Go to top ]
- Posted by: Chester Chen
- Posted on: July 08 2008 13:49 EDT
- in response to Nuno Teixeira
Hi, somehow the link to the part2 article doesn't allow one to click on Firefox (linux). I have to use view source to find the url. also at the end of article, "print friendly version", will bring up the part 1 instead of part 2. Chester -
Re: link to part2 doesn't work for firefox[ Go to top ]
- Posted by: xllerena xllerena
- Posted on: July 08 2008 17:31 EDT
- in response to Chester Chen
-
Re: link to part2 doesn't work for firefox[ Go to top ]
- Posted by: Shawn Kung
- Posted on: August 26 2008 02:21 EDT
- in response to Chester Chen
Speaking of Hadoop and MapReduce, it'll be interesting to see how analytic databases adopt the MapReduce construct. Check out Aster Data - the first fully integrated MPP relational database to seamlessly integrate MapReduce: http://www.asterdata.com/product/mapreduce.html -
Re: New Article: Scaling Your Java EE Applications - Part 2[ Go to top ]
- Posted by: fabrice armisen
- Posted on: July 08 2008 14:22 EDT
- in response to Nuno Teixeira
Where us the part #1 -
#Part 1 Try this[ Go to top ]
- Posted by: Alex Au
- Posted on: July 08 2008 20:49 EDT
- in response to fabrice armisen
-
Printer-friendly Link points to a different article[ Go to top ]
- Posted by: xinyu liu
- Posted on: July 08 2008 14:38 EDT
- in response to Nuno Teixeira
It links to the part 1 of the article. -
JBossCache[ Go to top ]
- Posted by: Tobias Frech
- Posted on: July 08 2008 15:26 EDT
- in response to Nuno Teixeira
JBossCache with Buddy replication enabled is quite powerful as well and should scale very nicely. See http://jbossworld.com/downloads/pdf/wednesday/JBOSS_1-150pm_ClusterTuning_Bela_Ban.pdf -
Re: New Article: Scaling Your Java EE Applications - Part 2[ Go to top ]
- Posted by: Chester Chen
- Posted on: July 08 2008 15:42 EDT
- in response to Nuno Teixeira
The approach about the data partition, described in the last section, is similar to the Oracle Table partition. Oracle table partition allows one partition data based on Hash or data range. The partition associated with different tablespace which can be on different volumes. This allows the data partitions to distributed on different disks. If one partition goes down, it may or may not affect others (depending on how indexes are created). Another issue one has to consider with this type of data partition is indexing. If you have to enforce data uniqueness, then you have to have "globe index" instead of "local index". then index will depends on all partitions. -
Re: New Article: Scaling Your Java EE Applications - Part 2[ Go to top ]
- Posted by: Yu Wang
- Posted on: July 08 2008 21:37 EDT
- in response to Chester Chen
The approach about the data partition, described in the last section, is similar to the Oracle Table partition. Oracle table partition allows one partition data based on Hash or data range. The partition associated with different tablespace which can be on different volumes. This allows the data partitions to distributed on different disks. If one partition goes down, it may or may not affect others (depending on how indexes are created).
Hi, Chen This technology is called "sharding". It is famouse and popular. Youtube (google) and other famouse social network websites may use this technology. It is different from Oracle partitions. Oracle partitions nomally run within a single Database instance. And to be bigger, Oracle need more expensive Storage. Oracle partitions are used to split disk IO bandwidth, we call this a "scale up" technology. "Sharding" is total distributed structure and can be "scaled out" with more cheap machines. Best Regards Wang Yu
Another issue one has to consider with this type of data partition is indexing. If you have to enforce data uniqueness, then you have to have "globe index" instead of "local index". then index will depends on all partitions. -
Re: New Article: Scaling Your Java EE Applications - Part 2[ Go to top ]
- Posted by: Cameron Purdy
- Posted on: July 09 2008 16:14 EDT
- in response to Yu Wang
This technology is called "sharding". It is famouse and popular. Youtube (google) and other famouse social network websites may use this technology.
Please note taht the Oracle Coherence Data Grid (which is Java middleware) supports dynamic partitioning of sets of objects across commodity hardware. It has all of the benefits of sharding, but also provides data integrity and continuous availability of information, even when there is server failure. It is used by many of those "famous and popular" large-scale sites, such as Orbitz and FedEx.com. Peace, Cameron Purdy Oracle Coherence: Data Grid for Java, .NET and C++
It is different from Oracle partitions. [..] "Sharding" is total distributed structure and can be "scaled out" with more cheap machines. -
Re: New Article: Scaling Your Java EE Applications - Part 2[ Go to top ]
- Posted by: Yu Wang
- Posted on: July 08 2008 22:17 EDT
- in response to Chester Chen
The approach about the data partition, described in the last section, is similar to the Oracle Table partition. Oracle table partition allows one partition data based on Hash or data range. The partition associated with different tablespace which can be on different volumes. This allows the data partitions to distributed on different disks. If one partition goes down, it may or may not affect others (depending on how indexes are created).
In "sharding", data uniqueness may be enforced partly in the client side (wrapped in API), so, "globe index" is not necessary. Wang Yu
Another issue one has to consider with this type of data partition is indexing. If you have to enforce data uniqueness, then you have to have "globe index" instead of "local index". then index will depends on all partitions. -
Re: New Article: Scaling Your Java EE Applications - Part 2[ Go to top ]
- Posted by: Chester Chen
- Posted on: July 09 2008 09:44 EDT
- in response to Yu Wang
If the data uniqueness is enforced by client side API, then the data integrity (regarding uniqueness) is not enforced by database. I am not sure about the"'global index' is not necessary" conclusion. This is almost saying since client side is enforcing data integrity, therefore database integrity checking is not necessary. Where can I find more about the "Sharding" ? ChesterThe approach about the data partition, described in the last section, is similar to the Oracle Table partition. Oracle table partition allows one partition data based on Hash or data range. The partition associated with different tablespace which can be on different volumes. This allows the data partitions to distributed on different disks. If one partition goes down, it may or may not affect others (depending on how indexes are created).
Another issue one has to consider with this type of data partition is indexing. If you have to enforce data uniqueness, then you have to have "globe index" instead of "local index". then index will depends on all partitions.
In "sharding", data uniqueness may be enforced partly in the client side (wrapped in API), so, "globe index" is not necessary.
Wang Yu -
Re: New Article: Scaling Your Java EE Applications - Part 2[ Go to top ]
- Posted by: peter lin
- Posted on: July 09 2008 10:36 EDT
- in response to Chester Chen
these places have more information. http://www.hibernate.org/414.html http://research.google.com/video.html You might want to look at coherence also and play with it. Many data grids use the same concepts and ideas to partition and distribute data across a cluster. peter -
Re: New Article: Scaling Your Java EE Applications - Part 2[ Go to top ]
- Posted by: Jive User
- Posted on: July 09 2008 11:59 EDT
- in response to Nuno Teixeira
What about the performance of gridgain? Gridgain looks to be a lot more advanced solution for MapReduce then Hadoop... -
Pretty useless...[ Go to top ]
- Posted by: Nikita Ivanov
- Posted on: July 09 2008 16:55 EDT
- in response to Nuno Teixeira
The article that doesn't mention either Coherence, or GridGain, or GigaSpaces among other scalability-related projects is pretty shallow and/or clearly biased... Sad. Nikita Ivanov. GridGain - Grid Computing Made Simple -
Re: Pretty useless...[ Go to top ]
- Posted by: peter lin
- Posted on: July 09 2008 18:41 EDT
- in response to Nikita Ivanov
The article that doesn't mention either Coherence, or GridGain, or GigaSpaces among other scalability-related projects is pretty shallow and/or clearly biased...
Glad to hear some one else thinks the article is shallow. TSS needs to raise the bar on articles. peter
Sad.
Nikita Ivanov.
GridGain - Grid Computing Made Simple -
No details[ Go to top ]
- Posted by: ARI ZILKA
- Posted on: July 09 2008 22:30 EDT
- in response to peter lin
Peter, I agree this article is frustrating. I looked for details on what the author did with JBossCache--nothing. I looked for what didn't work with Terracotta--nothing. And what the 9 memcache apps were. Just lots seems to be lots of love for database-centric architectures. That's kewl. Memcached and sharding have their place. But Terracotta is all about avoiding the database and I wouldn't expect someone to be good at both Memcache and Terracotta-based apps. I will just have to reach out to the author to learn what he tried and found. Definitely our customers running up to 150 node applications will be shocked to learn we don't work at that scale :) There is even a user on our forums asking us to comment and correct "questionable comments and conclusions about Terracotta," in the article. I wish I could but until I speak to the author, I reserve further comment. I guess Wang Yu should consider this an open invitation to sit down and discuss. Cheers, --Ari -
Re: No details[ Go to top ]
- Posted by: Yu Wang
- Posted on: July 10 2008 04:05 EDT
- in response to ARI ZILKA
Peter,
Hi, Ari In this article, I only showed the results of projects tested in our performance lab. This lab is open and free to let our partners and other ISVs to test their solutions in all kinds of our server machines. The results are not officely announced benchmarks. so, as I mentioned in the article, "The test result is only reflected on the projects in our laboratory; your results may vary." Yes, we have a project tested on JBossCache, but I said nothing about it. Because the result for this project was bad. It is unfair to say that JBossCache is not good. The result may be caused by limitation of developers's knowledge, misconfiguration of the products, or the problems of the architecture of this projects. I mentioned Terracotta because the customer was satisfied with it. All the projects (tested in our lab) with Memcached and Terracotta used them as distributed cache for database. None of the projects are tested on both of them. As Terracotta is becoming popular in China (after your trip to Beijing), can I invite your technical staffs to participant in the coming projects if Terracotta is used. Best Regards Wang Yu
I agree this article is frustrating. I looked for details on what the author did with JBossCache--nothing. I looked for what didn't work with Terracotta--nothing. And what the 9 memcache apps were.
Just lots seems to be lots of love for database-centric architectures.
That's kewl. Memcached and sharding have their place. But Terracotta is all about avoiding the database and I wouldn't expect someone to be good at both Memcache and Terracotta-based apps.
I will just have to reach out to the author to learn what he tried and found. Definitely our customers running up to 150 node applications will be shocked to learn we don't work at that scale :) There is even a user on our forums asking us to comment and correct "questionable comments and conclusions about Terracotta," in the article.
I wish I could but until I speak to the author, I reserve further comment.
I guess Wang Yu should consider this an open invitation to sit down and discuss.
Cheers,
--Ari -
Re: No details[ Go to top ]
- Posted by: ARI ZILKA
- Posted on: July 10 2008 06:28 EDT
- in response to Yu Wang
I mentioned Terracotta because the customer was satisfied with it.
Wang Yu, I understand now. I thought the projects were made up or somehow "concepted" by your team in order to test scale. Thanks for the clarification. Nice to hear that the Beijing trip was helpful for folks. I would be happy to participate in upcoming work. Cheers, --Ari
All the projects (tested in our lab) with Memcached and Terracotta used them as distributed cache for database. None of the projects are tested on both of them.
As Terracotta is becoming popular in China (after your trip to Beijing), can I invite your technical staffs to participant in the coming projects if Terracotta is used.
Best Regards
Wang Yu -
Re: No details[ Go to top ]
- Posted by: Manik Surtani
- Posted on: July 10 2008 06:49 EDT
- in response to Yu Wang
Yes, we have a project tested on JBossCache, but I said nothing about it. Because the result for this project was bad. It is unfair to say that JBossCache is not good. The result may be caused by limitation of developers's knowledge, misconfiguration of the products, or the problems of the architecture of this projects.
And again, we'd all like to know more details about the tests, of configuration and setup, even version tested (JBoss Cache scalability has been improving by leaps and bounds, with each version release). Setup is important too, see the link someone posted earlier in this thread to scalability with JBC when using buddy replication. Remember, without details of the test performed (data access patterns, etc., cluster sizes, network types) and product tested (version, configuration) it is very hard for anyone to gain much meaning from such a report since no one has any idea whether any this relates to their use case, and all the article would serve is to mislead. - Manik -
Re: No details[ Go to top ]
- Posted by: prabhat jha
- Posted on: July 10 2008 18:06 EDT
- in response to ARI ZILKA
I will have hard time believing any article that talks about benchmark, performance comparisons etc without providing the exact test configurations, deployables, network bandwidth etc. I have lately worked on performance and scalability tests quite a bit and have seen that even a minor configuration change results into different metrics. -
Re: No details[ Go to top ]
- Posted by: Yu Wang
- Posted on: July 10 2008 22:45 EDT
- in response to prabhat jha
Hi, prabhat You are absolutely right! Even “providing the exact test configurations, deployables, network bandwidth” is not enough. To be a benchmark, a standard application is needed. So I will iterate again that this is not a benchmark to compare the performance of different products. This is an introduction article for beginners, to show them some solutions for scalability based on the projects tested in our local Lab. The focus is on “scalability” instead of “performance”. For example, as I said in the article, if you want to make your Java applications more scalable, you may use distribute cache, and following open source products (balabala) are tested in our lab, and some of them got very good results. I never compared performance between two, for none of projects were tested on multiple platform. Best Regards Wang Yu -
Re: No details[ Go to top ]
- Posted by: Cameron Purdy
- Posted on: July 11 2008 01:07 EDT
- in response to ARI ZILKA
[..] I looked for what didn't work with Terracotta--nothing. And what the 9 memcache apps were.
Ari - FWIW comparing Terracotta and Memcached is like comparing apples and oranges. Memcached is going to scale id-based read/write operations as linearly as the problem is partitionable, but Memcached doesn't provide the object graph or synchronization capabilities that Terracotta has. If the problem is sharing a mutable object graph, Terracotta (or JBoss AOP POJO Tree Cache or whatever the name is) are the applicable Java tools for the job. If the problem is identity-based (e.g. key-based caching), then Memcached is appropriate (or upgrade to Coherence ;-). Peace, Cameron Purdy Oracle Coherence: Data Grid for Java and .NET
Just lots seems to be lots of love for database-centric architectures.
That's kewl. Memcached and sharding have their place. But Terracotta is all about avoiding the database and I wouldn't expect someone to be good at both Memcache and Terracotta-based apps. -
Just compliments for the article[ Go to top ]
- Posted by: Simone Avogadro
- Posted on: July 11 2008 04:31 EDT
- in response to Nuno Teixeira
Compliments, interesting and useful article! -
Re: New Article: Scaling Your Java EE Applications - Part 2[ Go to top ]
- Posted by: w k
- Posted on: July 18 2008 10:49 EDT
- in response to Nuno Teixeira
Good article illustrating some less expensive solutions. Thank you Yu Wang. -wk -
Re: New Article: Scaling Your Java EE Applications - Part 2[ Go to top ]
- Posted by: Nati Shalom
- Posted on: July 27 2008 02:51 EDT
- in response to Nuno Teixeira
First of all i'd had to agree with Nikitas comment that an article is missing references to other commonly used products in this area. Its interesting to see that all the discussion around scalability are really centered around database scaling. Indeed there are various methods to scale the database bottleneck either through database clustering or through a combination of In-Memory-Data-Grid (IMDG). Memcache can solve the read scaling (only for key based queries) but doesn't solve well read/write scaling and it doesn't really enables us to decouple the database from our application code and therefore i would consider that as a relatively low end solution which may improve our application read scaling in certain scenarios. It is also not clear what were the consistency and high availability constrains applied in this test. As we all know those two factors can have significant impact on scaling and performance. At this point its important to note that memcache doesn't provide any transaction/highavliability support. I also found the mentioning of parallel processing and the use of MPI and MapReduce quite irrelevant to the topic i.e. JEE application have better and more native choices such as GridGain/GigaSpaces and to a degree Terracotta (i'm sure i missed few names here). I wouldn't recommend JEE users to use either Hadoop or MPI before trying any of those native implementations. Hadoop is specifically geared for parallel agregation of distributed files (such as in the case of search engine), it becomes fairly limited when you try to position it as a general purpose solution for parallel processing. I listed below few references that provide additional prespective and methods for scaling JEE applications: 1. I wrote a blog compering database sharding/clustering with memory based clustering and suggested a model on when to choose or use each of them here 2. The following whitepaper discuss what end to end scaling means. The article is named: The Scalability Revolution - From Dead End to Open Road 3. We (GigaSpaces) have done a detailed research which lasted few months that compares the various type of scaling methods and their impact on the application end to end scaling. We used a typical transactional J2EE application based on JMS as feed and SessionBean for managing the business logic and obviously database for maintaining the state and high availability. We seperated the test into four parts in which we measured the impact of the improvements we made on the end to end application scaling. The step were: 1. Adding Hibernate as 2nd level cache 2. Using IMDG as the system of records and keeping the database asynchronously updated. 3. Partitioning the messaging layer (The message queue). 4. Scaling the entire application using Space Based Architecture where we basically partitioned the entire application not just the data and collocated the relevant elements based on their runtime dependency. You can see a preliminary results of that test here. Note: 1. The entire tests used the exact same code. In the test we plugged in different (more scalable) middleware layers keeping the code decoupled from those changes. The only thing that changed in few of the steps is the DAO implementation. 2. The full report+the code can be available for those interested. The full details will be published soon. In the mean time you feel free to contact me directly if you have further questions on this matter. As you can see from those results end-end scaling can be very different then measuring just the data-layer scaling. Nati S. natishalom.typepad.com GigaSpaces