|
Sponsored Links
Resources
Enterprise Java Research Library
Get Java white papers, product information, case studies and webcasts
|
News
News
News
|
Messages: 26
Messages: 26
Messages: 26
Printer friendly
Printer friendly
Printer friendly
Post reply
Post reply
Post reply
XML
XML
XML
|
 |
New Article: Scaling Your Java EE Applications - Part 2
Java applications can be scaled vertically (on a single system), or horizontally (across multiple systems). But to do either, you have to understand all parts of the system and software. Not doing so could defeat the purpose of adding system resources or more systems. Wang Yu presents some surprising results of Java application scalability based on his experiences in a performance laboratory. The second installment of this series discusses scaling horizontally.
Read Article
|
Threaded replies
| · |
Re: New Article: Scaling Your Java EE Applications - Part 2
by
|
Message #262242
Post reply
Post reply
Post reply
Go to top
Go to top
Go to top
|
 |
link to part2 doesn't work for firefox
Hi, somehow the link to the part2 article doesn't allow one to click on Firefox (linux). I have to use view source to find the url.
also at the end of article, "print friendly version", will bring up the part 1 instead of part 2.
Chester
|
|
Message #262253
Post reply
Post reply
Post reply
Go to top
Go to top
Go to top
|
 |
Re: New Article: Scaling Your Java EE Applications - Part 2
The approach about the data partition, described in the last section, is similar to the Oracle Table partition. Oracle table partition allows one partition data based on Hash or data range. The partition associated with different tablespace which can be on different volumes. This allows the data partitions to distributed on different disks. If one partition goes down, it may or may not affect others (depending on how indexes are created).
Another issue one has to consider with this type of data partition is indexing. If you have to enforce data uniqueness, then you have to have "globe index" instead of "local index". then index will depends on all partitions.
|
|
Message #262293
Post reply
Post reply
Post reply
Go to top
Go to top
Go to top
|
 |
Re: New Article: Scaling Your Java EE Applications - Part 2
The approach about the data partition, described in the last section, is similar to the Oracle Table partition. Oracle table partition allows one partition data based on Hash or data range. The partition associated with different tablespace which can be on different volumes. This allows the data partitions to distributed on different disks. If one partition goes down, it may or may not affect others (depending on how indexes are created).
Another issue one has to consider with this type of data partition is indexing. If you have to enforce data uniqueness, then you have to have "globe index" instead of "local index". then index will depends on all partitions.
Hi, Chen This technology is called "sharding". It is famouse and popular. Youtube (google) and other famouse social network websites may use this technology.
It is different from Oracle partitions. Oracle partitions nomally run within a single Database instance. And to be bigger, Oracle need more expensive Storage. Oracle partitions are used to split disk IO bandwidth, we call this a "scale up" technology. "Sharding" is total distributed structure and can be "scaled out" with more cheap machines.
Best Regards Wang Yu
|
|
Message #262299
Post reply
Post reply
Post reply
Go to top
Go to top
Go to top
|
 |
Re: New Article: Scaling Your Java EE Applications - Part 2
The approach about the data partition, described in the last section, is similar to the Oracle Table partition. Oracle table partition allows one partition data based on Hash or data range. The partition associated with different tablespace which can be on different volumes. This allows the data partitions to distributed on different disks. If one partition goes down, it may or may not affect others (depending on how indexes are created).
Another issue one has to consider with this type of data partition is indexing. If you have to enforce data uniqueness, then you have to have "globe index" instead of "local index". then index will depends on all partitions.
In "sharding", data uniqueness may be enforced partly in the client side (wrapped in API), so, "globe index" is not necessary.
Wang Yu
|
|
Message #262473
Post reply
Post reply
Post reply
Go to top
Go to top
Go to top
|
 |
Re: New Article: Scaling Your Java EE Applications - Part 2
The approach about the data partition, described in the last section, is similar to the Oracle Table partition. Oracle table partition allows one partition data based on Hash or data range. The partition associated with different tablespace which can be on different volumes. This allows the data partitions to distributed on different disks. If one partition goes down, it may or may not affect others (depending on how indexes are created).
Another issue one has to consider with this type of data partition is indexing. If you have to enforce data uniqueness, then you have to have "globe index" instead of "local index". then index will depends on all partitions.
In "sharding", data uniqueness may be enforced partly in the client side (wrapped in API), so, "globe index" is not necessary.
Wang Yu
If the data uniqueness is enforced by client side API, then the data integrity (regarding uniqueness) is not enforced by database.
I am not sure about the"'global index' is not necessary" conclusion. This is almost saying since client side is enforcing data integrity, therefore database integrity checking is not necessary.
Where can I find more about the "Sharding" ?
Chester
|
|
Message #262489
Post reply
Post reply
Post reply
Go to top
Go to top
Go to top
|
 |
Re: New Article: Scaling Your Java EE Applications - Part 2
What about the performance of gridgain? Gridgain looks to be a lot more advanced solution for MapReduce then Hadoop...
|
|
Message #262513
Post reply
Post reply
Post reply
Go to top
Go to top
Go to top
|
 |
Re: New Article: Scaling Your Java EE Applications - Part 2
This technology is called "sharding". It is famouse and popular. Youtube (google) and other famouse social network websites may use this technology.
It is different from Oracle partitions. [..] "Sharding" is total distributed structure and can be "scaled out" with more cheap machines.
Please note taht the Oracle Coherence Data Grid (which is Java middleware) supports dynamic partitioning of sets of objects across commodity hardware. It has all of the benefits of sharding, but also provides data integrity and continuous availability of information, even when there is server failure. It is used by many of those "famous and popular" large-scale sites, such as Orbitz and FedEx.com.
Peace,
Cameron Purdy Oracle Coherence: Data Grid for Java, .NET and C++
|
|
Message #262537
Post reply
Post reply
Post reply
Go to top
Go to top
Go to top
|
 |
Re: Pretty useless...
The article that doesn't mention either Coherence, or GridGain, or GigaSpaces among other scalability-related projects is pretty shallow and/or clearly biased...
Sad. Nikita Ivanov. GridGain - Grid Computing Made Simple
Glad to hear some one else thinks the article is shallow. TSS needs to raise the bar on articles.
peter
|
|
Message #262573
Post reply
Post reply
Post reply
Go to top
Go to top
Go to top
|
 |
No details
Peter,
I agree this article is frustrating. I looked for details on what the author did with JBossCache--nothing. I looked for what didn't work with Terracotta--nothing. And what the 9 memcache apps were.
Just lots seems to be lots of love for database-centric architectures.
That's kewl. Memcached and sharding have their place. But Terracotta is all about avoiding the database and I wouldn't expect someone to be good at both Memcache and Terracotta-based apps.
I will just have to reach out to the author to learn what he tried and found. Definitely our customers running up to 150 node applications will be shocked to learn we don't work at that scale :) There is even a user on our forums asking us to comment and correct "questionable comments and conclusions about Terracotta," in the article.
I wish I could but until I speak to the author, I reserve further comment.
I guess Wang Yu should consider this an open invitation to sit down and discuss.
Cheers,
--Ari
|
|
Message #262647
Post reply
Post reply
Post reply
Go to top
Go to top
Go to top
|
 |
Re: No details
Peter,
I agree this article is frustrating. I looked for details on what the author did with JBossCache--nothing. I looked for what didn't work with Terracotta--nothing. And what the 9 memcache apps were.
Just lots seems to be lots of love for database-centric architectures.
That's kewl. Memcached and sharding have their place. But Terracotta is all about avoiding the database and I wouldn't expect someone to be good at both Memcache and Terracotta-based apps.
I will just have to reach out to the author to learn what he tried and found. Definitely our customers running up to 150 node applications will be shocked to learn we don't work at that scale :) There is even a user on our forums asking us to comment and correct "questionable comments and conclusions about Terracotta," in the article.
I wish I could but until I speak to the author, I reserve further comment.
I guess Wang Yu should consider this an open invitation to sit down and discuss.
Cheers,
--Ari
Hi, Ari In this article, I only showed the results of projects tested in our performance lab. This lab is open and free to let our partners and other ISVs to test their solutions in all kinds of our server machines. The results are not officely announced benchmarks. so, as I mentioned in the article, "The test result is only reflected on the projects in our laboratory; your results may vary."
Yes, we have a project tested on JBossCache, but I said nothing about it. Because the result for this project was bad. It is unfair to say that JBossCache is not good. The result may be caused by limitation of developers's knowledge, misconfiguration of the products, or the problems of the architecture of this projects. I mentioned Terracotta because the customer was satisfied with it.
All the projects (tested in our lab) with Memcached and Terracotta used them as distributed cache for database. None of the projects are tested on both of them.
As Terracotta is becoming popular in China (after your trip to Beijing), can I invite your technical staffs to participant in the coming projects if Terracotta is used.
Best Regards Wang Yu
|
|
Message #262676
Post reply
Post reply
Post reply
Go to top
Go to top
Go to top
|
 |
Re: No details
I mentioned Terracotta because the customer was satisfied with it.
All the projects (tested in our lab) with Memcached and Terracotta used them as distributed cache for database. None of the projects are tested on both of them.
As Terracotta is becoming popular in China (after your trip to Beijing), can I invite your technical staffs to participant in the coming projects if Terracotta is used.
Best Regards Wang Yu
Wang Yu,
I understand now. I thought the projects were made up or somehow "concepted" by your team in order to test scale. Thanks for the clarification. Nice to hear that the Beijing trip was helpful for folks. I would be happy to participate in upcoming work.
Cheers,
--Ari
|
|
Message #262678
Post reply
Post reply
Post reply
Go to top
Go to top
Go to top
|
 |
Re: No details
Yes, we have a project tested on JBossCache, but I said nothing about it. Because the result for this project was bad. It is unfair to say that JBossCache is not good. The result may be caused by limitation of developers's knowledge, misconfiguration of the products, or the problems of the architecture of this projects.
And again, we'd all like to know more details about the tests, of configuration and setup, even version tested (JBoss Cache scalability has been improving by leaps and bounds, with each version release). Setup is important too, see the link someone posted earlier in this thread to scalability with JBC when using buddy replication.
Remember, without details of the test performed (data access patterns, etc., cluster sizes, network types) and product tested (version, configuration) it is very hard for anyone to gain much meaning from such a report since no one has any idea whether any this relates to their use case, and all the article would serve is to mislead.
- Manik
|
|
Message #262721
Post reply
Post reply
Post reply
Go to top
Go to top
Go to top
|
 |
Re: No details
I will have hard time believing any article that talks about benchmark, performance comparisons etc without providing the exact test configurations, deployables, network bandwidth etc. I have lately worked on performance and scalability tests quite a bit and have seen that even a minor configuration change results into different metrics.
|
|
Message #262726
Post reply
Post reply
Post reply
Go to top
Go to top
Go to top
|
 |
Re: No details
Hi, prabhat You are absolutely right! Even “providing the exact test configurations, deployables, network bandwidth” is not enough. To be a benchmark, a standard application is needed.
So I will iterate again that this is not a benchmark to compare the performance of different products. This is an introduction article for beginners, to show them some solutions for scalability based on the projects tested in our local Lab. The focus is on “scalability” instead of “performance”.
For example, as I said in the article, if you want to make your Java applications more scalable, you may use distribute cache, and following open source products (balabala) are tested in our lab, and some of them got very good results. I never compared performance between two, for none of projects were tested on multiple platform.
Best Regards Wang Yu
|
|
Message #262728
Post reply
Post reply
Post reply
Go to top
Go to top
Go to top
|
 |
Re: No details
[..] I looked for what didn't work with Terracotta--nothing. And what the 9 memcache apps were.
Just lots seems to be lots of love for database-centric architectures.
That's kewl. Memcached and sharding have their place. But Terracotta is all about avoiding the database and I wouldn't expect someone to be good at both Memcache and Terracotta-based apps.
Ari - FWIW comparing Terracotta and Memcached is like comparing apples and oranges. Memcached is going to scale id-based read/write operations as linearly as the problem is partitionable, but Memcached doesn't provide the object graph or synchronization capabilities that Terracotta has. If the problem is sharing a mutable object graph, Terracotta (or JBoss AOP POJO Tree Cache or whatever the name is) are the applicable Java tools for the job. If the problem is identity-based (e.g. key-based caching), then Memcached is appropriate (or upgrade to Coherence ;-).
Peace,
Cameron Purdy Oracle Coherence: Data Grid for Java and .NET
|
|
Message #263672
Post reply
Post reply
Post reply
Go to top
Go to top
Go to top
|
 |
Re: New Article: Scaling Your Java EE Applications - Part 2
Good article illustrating some less expensive solutions. Thank you Yu Wang. -wk
|
|
Message #264623
Post reply
Post reply
Post reply
Go to top
Go to top
Go to top
|
 |
Re: New Article: Scaling Your Java EE Applications - Part 2
First of all i'd had to agree with Nikitas comment that an article is missing references to other commonly used products in this area. Its interesting to see that all the discussion around scalability are really centered around database scaling. Indeed there are various methods to scale the database bottleneck either through database clustering or through a combination of In-Memory-Data-Grid (IMDG). Memcache can solve the read scaling (only for key based queries) but doesn't solve well read/write scaling and it doesn't really enables us to decouple the database from our application code and therefore i would consider that as a relatively low end solution which may improve our application read scaling in certain scenarios.
It is also not clear what were the consistency and high availability constrains applied in this test. As we all know those two factors can have significant impact on scaling and performance. At this point its important to note that memcache doesn't provide any transaction/highavliability support.
I also found the mentioning of parallel processing and the use of MPI and MapReduce quite irrelevant to the topic i.e. JEE application have better and more native choices such as GridGain/GigaSpaces and to a degree Terracotta (i'm sure i missed few names here). I wouldn't recommend JEE users to use either Hadoop or MPI before trying any of those native implementations. Hadoop is specifically geared for parallel agregation of distributed files (such as in the case of search engine), it becomes fairly limited when you try to position it as a general purpose solution for parallel processing.
I listed below few references that provide additional prespective and methods for scaling JEE applications:
1. I wrote a blog compering database sharding/clustering with memory based clustering and suggested a model on when to choose or use each of them here
2. The following whitepaper discuss what end to end scaling means. The article is named: The Scalability Revolution - From Dead End to Open Road
3. We (GigaSpaces) have done a detailed research which lasted few months that compares the various type of scaling methods and their impact on the application end to end scaling. We used a typical transactional J2EE application based on JMS as feed and SessionBean for managing the business logic and obviously database for maintaining the state and high availability. We seperated the test into four parts in which we measured the impact of the improvements we made on the end to end application scaling.
The step were:
1. Adding Hibernate as 2nd level cache 2. Using IMDG as the system of records and keeping the database asynchronously updated. 3. Partitioning the messaging layer (The message queue). 4. Scaling the entire application using Space Based Architecture where we basically partitioned the entire application not just the data and collocated the relevant elements based on their runtime dependency.
You can see a preliminary results of that test here.
Note: 1. The entire tests used the exact same code. In the test we plugged in different (more scalable) middleware layers keeping the code decoupled from those changes. The only thing that changed in few of the steps is the DAO implementation.
2. The full report+the code can be available for those interested. The full details will be published soon. In the mean time you feel free to contact me directly if you have further questions on this matter. As you can see from those results end-end scaling can be very different then measuring just the data-layer scaling.
Nati S. natishalom.typepad.com GigaSpaces
|
|
 |
New content on TheServerSide.comNew content on TheServerSide.comNew content on TheServerSide.com |
 |
 |
Reza Rahman explores the features of the proposed JSR 299, Contexts and Dependency Injection for Java EE (CDI). When approved, it promises to be a key feature of Java EE 6.
(November 2, Article)
SAML is an XML-based standard for exchanging authentication and authorization data between security domains. The single most important problem that SAML was created to solve is the Web browser Single Sign-On problem. Many organizations are debating whether to stay with version 1.1 or move to 2.0. This article makes observations about both options.
(September 28, Article)
Joe Ottinger takes a look at how people learn, and applies it to the practice of programming. He notes that understanding how people learn is an essential part of working in a programming team.
(September 22, Article)
Stephen Maryka gave us an article about the Asynchronous Web and posed a number of questions that get examined like an approach to delivering Asynchronous Web capabilities through extensions to existing Java EE technologies.
(July 14, Article)
JavaServer Faces Flex goal is to provide users capability in creating standard Flex components, part of flexSDK which is open sourced through MPL license, as normal JSF components. This article by Ji Hoon Kim will provide an overview of creating a simple multilingual JSF page consisting of JSF Flex tags.
(June 29, Article)
In this session Jeff explores the key characteristics of successful SOA projects. He covers some of the patterns, and anti-patterns, tool sets, and strategies that he himself learned the hard way. Last, he provides a strategy and blueprint for achieving a high likelihood of success in your SOA project.
(June 23, Tech Talk)
Ari Zilka, CTO of Terracotta, Inc., talks about the new features in Terracotta 3.1, announced during JavaOne and available now.
(June 15, Tech Talk)
In this Tech Talk, Josh Long explores an integration challenge using Spring Integration and walks through the implementation, employing and expanding on the basic patterns of Enterprise Application Integration to tie together components into a function integration solution, and then demonstrates how Spring Integration helps address the integration requirements.
(June 15, Tech Talk)
In this Tech Talk, David Geary teaches you: The basics of Google Web Toolkit; How to implement Ajax-enabled applications in Java; Internationalization; Hooking into the browser history mechanism; Remote procedure calls.
(June 4, Tech Talk)
Jon Kern discusses the best architecture/technical solutions and ensure that they are repeated by all developers. By tackling the architecture up-front in a serial manner, subsequent parallel development will be much more manageable and predictable.
(May 28, Tech Talk)
This keynote describes the frustrations of modern knowledge workers in their quest to actually get some work done, and solutions for how to guard yourself against all those distractions. Neal Ford talks about environments, coding, acceleration, automation, and avoiding repetition as ways to defeat the misguided attempts to sap your ability to produce good work.
(May 26, Tech Talk)
Gil demonstrates how new, aggressive uses of already abundant compute capacity by common applications offer competitive value for application designers.
(May 21, Tech Talk)
Chris Keene introduces WaveMaker as a new way to automate the ability to generate Hibernate classes in order to more quickly bring OR mapping into an application.
(May 19, Article)
In this session Nati Shalom demonstrates how to take a standard Java EE web application and scale it out or down dynamically without changes to the application code. Seeing as most web applications are over-provisioned to meet infrequent peak loads, this is a dramatic change because it enables growing your application as needed, when needed, without paying for unutilized resources.
(May 19, Tech Talk)
Download the entire book of Jakarta-Struts Live and learn about Struts MVC, Tiles, the Validator, DynaActionForms, plug-ins, internationalization, and more.
(Book PDF Download)
The Application Server Matrix is a detailed listing of J2EE vendors and their application server products, with information on latest version numbers, J2EE spec support and licensing, pricing, platform support, and links to product downloads and reviews.
(Application Server Comparison Matrix)
|
|
|