Apart from the server that the EJB's sit on, what factors (if any) limit the scalability and availability of Enterprise JavaBeans? In other words what aspects would I have to consider to make the beans themselves scalable and highly available?
Thanks in advance,
Take a look at an article entitled "EJB Clustering" that I posted on www.onjava.com on 12/15/00. This should sum it up nicely for you. Also, I will have other articles on EJB-level load-balancing and failover techniques appearing there and here in the near future.
The EJB specification itself is a big limit.
The best aproach to ensure the scalability of an EJB scenario is :
1. put aside the bulk of EJB spec when developing(see below)
2. use only Stateless EJB that will just rely on the database to do the processing (the database is the only one who performs real stuff anyway), that is in every "business method", get a connection send the SQL, get the results , close the connection, return and that's it.
3. Scale the database if you think you need to (have a look at the recent TPC-C and TPC-W benchmarks and do a little math , you'll probably see that you won't need more than a 4-processor 1 GB of RAM if you're site is not some kind of new planetary online bank)
4. get a developer/designer/DBA that's really strong with data modelling so you'll get a good database schema. That's the decisive thing.
5. Eventually apply Database specific tuning
Then you'll never have scalability problems.
By making all the beans stateless you can easily deploy a clustered app server (most of them only cluster these type of beans) if you consider necessary.
That is you don't trust the app server, because I'm yet to see a recommendation from a vendor like Oracle to cluster the database for high availability.
In your scenario do you intend to handle the transactions for the database yourself ?
Why not let Entity beans handle the persistance and transactions, do you have concerns for the performance of entity beans ?
I would concur with Ian on this one: Don't be too afraid of entity beans. If used inproperly, you will run into a bottlneck, but if used correctly, they can provide a variety of advantages over stateless session beans doing JDBC:
1) Many application server vendors have bean-bean relationship support as specified in the EJB 2.0 specification. Try doing a many-many relationship manually in a stateless session bean -- it's rough.
2) Many containers support entity EJB data caching.
3) WLS has a "grouping" mechanism as part of their container. You can specify "groups" of persistent fields that are loaded concurrently. So, if you have a bean with 10 fields, you could specify a group of 4 fields. Any method that accesses one of those fields will cause all of them to be loaded into memory while the other 6 remain out of memory.
4) Many containers implement lazy loading schemes that only bring data into memory on demand.
5) Many containers implement pessimistic PK locking adn optimistic PK locking that automatically defers collisions to the database (or keeps them in memory). You can also do read-only entity beans that periodically load their data into memory (whether a client is invoking them or not).
There are other performance enhancements as well. The stateless session bean using JDBC is a very popular model and it will scale nicely, but just don't overlook the benefits of using entity beans, too.
I didn't intend to go into polemics, I was just giving a practical advice.
My point of view from a practical perspective (I do have a more in depth perspective as well) :
- ease of development, SQL is better than Java for the kind of things that EJBs are trying to solve. After all SQL is really made of 4 commands and is a 4th generation language. And the databases perform those commands very well. If you compare that to the tons of patent surrending this site, or to applications like Java Blue Print from sun, you'd better stay clear from Entity Beans.
- Flexibility. SQL is far more flexible than a framework of EJB entities.
- Performance. Databases really do perform very well.
About app servers, I wouldn't be so sure. At least at this time
- Investment security. Relational databases are built upon a formal theoretical model that is there for quite a while and stood well the test of time. And by the way, the model is quite simple and powerfull and you can express it in under 100 pages. If you think you saw something like that in tons of pages of specs from Sun please give me a point.
As to what regards exactly your points:
1) are you trying to say that a SQL Relational Database Server cannot handle relations well?
2) data caching is far more advanced at the database level both in performance and in things like transactional integrity, transaction isolation and so on. There's no way you can claim WebLogic is better than Oracle or DB2 or the likes at data caching.
3 + 4) SQL only brings data that you require.
5) Many containers can increase the concurrency collisions and locking conflicts by reducing the flexibility of the database lock manager.
Anyway, these rather fine points a side, there are overwhelming practical considerations as well as practical project failures examples (just look at their proud Java Blue Print Application on the J2EE site) to stay away from the Entity EJB model until Sun manages to get the things in order.
At least what I would expect from Sun is to prove that this model works. They could, for example, implement the TPC-W and show some results, after all it is very similar with Java Blue Print application.
First of all I'd like to thank you all for such a good response to the question I posted. There seems to be many differences in opinion as to whether stateless session beans or entity beans should be employed in an application. Despite the thorough response I still have a number of questions. What advantages do Entity Beans have over Sessions Beans considering that they are so much more difficult to implement? And nobody seems to have mentioned Stateful Session Beans, does anyone recommend their use.
Statefull session beans are good in their own right, but just to handle type of things as a shopping cart for example, intermediary objects that support user interactions but will not be stored in the database.
These type of functionality is already present in the Servlet/JSP layer, and it is arguably better to leave it there.
Since the most complex task in business applications is to handle database transactions, than you have to choose between using Entity Beans which try to offer an OO view of the data and Stateless Session Bean where you mainly use SQL code to express your business transactions over the data.
The perceived advantage of Entity Beans is that they offer an Object Oriented "view of the world".
Some authors/authorities in this domain still recommend using Session Beans to provide business logic wrapping around Entity Beans as a best practice.
It really depends whether the developers of the project are so uncomfortable with SQL, as to suffer the penalties of Entity Beans as another way of data representation.
Many people will try to say they offer an performance edge by using advanced caching techniques, though one can hardly prove such a thing. I am more than skeptical about this subject.
I would call it another fancy way of moving bytes around, but you have to consider these:
- I'm no expert (only some friends think I'm knowledgeable), certainly not an author/authority
- I'm biased against fancy architectures, marketing hypes, and any kind of technology that doesn't have some kind of theoretical model/ formal proof or things like that to help us mere mortals to understand its "raison d'etre"
- I may be very ignorant - the more you know the more you realise how much you don't know.
So you should definitely compare other's opinion, but be aware to keep an open mind on things.
<Database vendor dude chiming in>
When we talk about performance of data caching and how it relates to doing so in the EJB layer or the database layer, you need to understand how each layer caches the data. For instance, yes an RDBMS does generally have a very robust caching scheme, but its cripled by its complexity sometimes. For instance, the RDBMS must maintain consistancy, the EJB does not. This can be good and bad. If consistancy isnt an issue (read-only) caching in the EJB layer may be better.
Secondly, realize what the RDBMS must do to cache data. For instance, imagine a query for "hot" data, or data that is actually contained in the RDBMS cache.
1) Network call
2) Call is demarshalled
3) SQL is parsed
6) MRU-LRU chain is traversed
7) Data is found
8) Pointer pages updated
9) MRU-LRU chain "pushed"
10) Data marshalled back
Now, compare this with pulling a cached version from the app server where we might simply be doing a hashlookup and returning a pointer. AN RDBMS has elegent caching, but in some ways, its clutzy.
In some tests we've found caching in the app server side can be tens of times faster then in the RDBMS. It depends on your app, your data, and what happens to that data. As I like to say, your mileage may vary.
On the comment, that the best way to increase the performance of an EJB app is to throw away the spec. Im sorry that is totally absurd. Lets be honest, the ONLY reason any should do ANYTHING in EJB is for the portability. If portability isnt your requirement, then I might suggest another framework then EJB, as most proprietary app server frameworks do indeed perform better then EJB. SOooooo, the net-net is why use EJB if you dont need portability? The fact you have chosen EJB, and then decide to ignore the spec and portability is in my mind non-sensical.
Internet Applications Division
<great to spark interest>
Coming from the database vendor insider your obesrvations are very valuable.
Of course, you could have assumed it is well known what a database generally does.
I suspect that because you work in Internet Application Division you're test about EJB/Database caching is a little biased so your colleagues from Sybase Database Division may disagree a little bit.
But I'll try to respond myself:
Your toughest point is consistency, which you think that EJBs are free to ignore on some ocassions ("read-only").
Consistency is fundamental in business applications that now are migrating to Internet.
Of course, you try to say that there are some data that largely stay unmodified (let's say a lookup table with Product_Name, Product_Id ...) , then only if you write a biased test can the EJB perform ten times better buy doing a simple hashtable lookup, because the non-EJB application having the same knowledge of the data will maintain a hashtable as well, only it doesn't necessarily need to contain EJB instances and it will perform at least as well , probably a lot better (you know what I'm talking about).
The reason why I said throw away large parts of EJB spec is because it's far from ready for prime time production, IMHO anyway, and you admitted your self that proprietary framework do perform better.
In the meantime, if you don't think that EJB App Server vendors performed a lousy work, you have to admit that EJB apps perfom worse because some flaws are in the spec themselves - especially in the Entity Bean part.
You have the consolation that EJBs are "portable".
Of course from a marketing point of view it holds a little.
But what if you're a project manager, you have the business problem in front of you and a six months dead-line , do you think that if you have WebLogic + Oracle 8i for 3 months and preliminary tests don't look well; will you be happy to move to Sybase EAS + Sybase 12 ?
I think since the deployment of applications is so rapid that guys are mixing up EJB's with database and Entity beans with database records. The finer points are being left out, things like instance pooling. Same is happening with the Object models. For lack of clarity people are mixing up Object models with ER Diagrams without appreciating their different inherent characteristics - being in the realm of the DATA driven era, the internet. Three tier architechtures will all look the same - because they have three tiers. That is the reason attention needs to be paid to issues like data caching, instance pooling, transaction isolation and webserver clustering. Guys who lack patience understanding these issues might just prefer doing away with EJB altogether.I agree with Dave in his apporach to Entity beans, whether the kind of access has to be determined whether it is a readonly or read-write before jumping to conclusions. Stateful sessions beans wouldn't scale unless you pay more attention to Object passivation and activation.
P bomma kumar
I generally agree with your approach, in that I think that most app server's implementation of bean managed persistent entity beans are ridiculously impractical. I find it somewhat interesting that Sun's EJB 2.0 specification actually *discourages* the use of stateless session beans - apparently they want to emphasize session beans as "client-related state" objects. How odd.
A couple of points that I have issue with:
- EJB was *never* intended to be primarily a database wrapping layer. It is a distributed object runtime model. It just happens that most applications fall under the "data driven systems" category. It would be much better supported by an object/relational mapper. Sadly, Sun didn't realize this until recently (EJB 2).
- A good object/relational mapper can provide tremendous productivity gains by auto generating SQL code. If I had one of these tools (i.e. TopLink or Persistence PowerTier), I would be much more open to using entity beans. This enables the time to market benefits of writing *no* SQL code to start, and then to find your bottlenecks afterwards and to optimize with stored procedures or whatnot.
- Relational databases do not support *certain kinds* of relations well. They specifically do not support navigational access very well. Now, SQL3 with REF objects may be able to change this, but preliminary experiences with Oracle 8i haven't swayed me. A transactional object database/cache such as those offered by Gemstone or Persistence is a tremendous performance boost for applications with complex data relations and mainly navigational access patterns.
- Data caching in an RDBMS is excellent, but see the above. There's something to be said for "application bias" in caching. RDBMS tend not to have that "bias" that an object cache has. This makes an RDBMS an excellent shared data store, but not so good at being malleable to particular access patterns beyond general "warehouse vs. oltp" optimizations.
- Regarding locking. One of the most intriguing models I've seen is the GemStone model of providing true optimistic transactions. You are given a database view at the start of your transaction and are *guaranteed* to have that unique view until your transaction aborts / commits. This is performed without locking. They use a combination of data versioning and garbage collection to implement this at the data level, so it has a cost -- but it is an excellent model as it allows for "on demand" object locking, at whatever granularity you wish. This is tremendously powerful.
Some object relational mappers provide a similar, though weaker, form of this.
Of course, you are right in your own way about specific things, after all, the only tried and true software engineering principle that holds well against all technologies is expressed by the famous pattern "there's no silver bullet". Or should we say depleted uranium now :)
I want to respond to your points and also be as brief as possible.
Java was *never* intended for the server side. It ended up this way, and we have to live with some limitations. EJBs were intended as a new marketing hype to build upon the success of JavaBean keyword.
Of course if you only use stateless session beans than you sooner or later end up with the question: why paying tons of money on app server licenses and big hardware? A simple Object Request Broker will do.
And this really IS a valid question.
So, Sun has its point in discouraging the use of stateless session EJB.
Another point may be to cover their own ignorance when they choose the term, because there's no such thing as a session without a state.
You seem to favor true Object Relational mapping tools/framework, that have been there for some time, and certainly there are a much better option than Entity Beans.
An ideal framework would provide an OO, transactional aware and application aware cache over a real database, as you pointed out.
Then if you're developer feel much more confortable with OO concepts then with database concepts, certainly they gain productivity.
That is not to say that developers experienced with SQL would be less productive. Arguably , the contrary may hold.
However as to what regards the performances of this solution, there are drastic limitations of what the transactional cache can do while preserving the integrity of the database.
The best scenario, is when the application and the database are totally new and you expect the database to be accessed only through the app server/ persistent cache. Then you can get a performace boost from the persistent cache.
If you build upon an already existing database that is accessed by application using other technologies, than there's no way the OR mapping tool can enhance performance, although they may claim to.
And their "guaranteed" unique views, tarnasctions without locking, and so on ... common, did somebody revolutionized database theory in the meantime ?
sorry, one can hardly keep up with the pace of technology these days :)
In any case, from a practical standpoint where do you go if you have to build a application/system ?
Maybe wait a couple of months for upcoming Java Data Objects specification - another proof that Enity EJBs were not done right in the first place.
Then wait for the implementations?
Using OR mapping tools is definitely a way to go - not the only way, but certainly one shouldn't buy into EJB marketing hype.
The terms stateless and statefull refer to conversational state, or state that cannot be shared amoung clients. This does not mean that a stateless session bean does not have state. Because of this Stateless session beans have much better scability characteristics. This is because they can be shared between clients. Entity and statefull session beans cannot be shared. Entity beans introduce a lot of overhead. Add in passivation/activation issues and I'd rather go for the JSP/Servlet OR mapping tool solution as relational DB are plenty fast enough to handle most situations.
The FoodSmart, an EJB application build by GemStone, only uses Stateless SessionBeans in the service layer. Rather than returning a large result sets to the client, the SLSB wraps it in an entitybean and passes a handle back to the client. This way the SessionBean does not have to maintain conversational state. You get the advantages of using the most scalable container. IMHO, using EntityBeans as an Iterator is probably the most useful thing you can use them for.
In regards to the clustering article, the primary target was clustering at the application server level. This introduces a significant cache coherancy problem that was never mentioned. How do you deal with this? IME, using distributed transactions to maintain cache coherancy kill performance. I don't really see this as a viable answer on most high volume sites.
Most implementations Ive seen outside of some boutique ones like PowerTier had, use flavors of optimistic locking to maintain cache concurrency. This is a passive approach, but simple and much less error prone then active solutions through cache update notifications.
Internet Applications Division