Sun has posted JSR 117: J2EE API's for continous availability to the JCP. The goal for this JSR is to specify a programming model and to standardize the APIs for some of the functions that are essential to continuous-availability applications. Fail over, online upgrading, error handling, and logging are just some of the topics covered.
Although J2EE was defined to be as easy and 'transparent' to the developer as possible, in some ways hiding too many system-services may have been counter productive, as it forces developers to rely on proprietary vendor extensions in certain situations. Development of this JSR would give developers more choices, and facilitate the needs of large scale highly available applications. Wait a sec, isn't every Web site supposed to highly available? :)
JSR 117 Homepage
No, not every web site is suppose to be "highly available" , this is usually a bad target to follow for many projects.
Yes, trading sites during the trading hours should be highly available. There certainly are exceptions.
But there was an article in pcweek magazine some time ago which clearly demontsrated that shooting for 99.999% uptime is not economically justified.
Automatic failover, yep that's great but can someone say where is that software system with no "single point of failure" ?
Failover that point if you can.
Yep, just another in the long series of APIs , JSRs and other bla bla
Sun risk loosing its credibility trying to make Java a "one size fits all".
Whether a web site needs to be highly available completely depends on the RAS requirements. It's the people paying for it that determine whether it's appropriate or not.
If J2EE servers can have features that mean you don't need a lot of rocket scientists and custom frameworks to do HA then I think it's a good thing. You still need a thinker on the team who knows what he/she is doing but not everyone needs to be a thinker and more importantly, there aren't that many 'thinkers' around in any case. So, if they make it easier then I think it's a good thing.
5 9s, like you say is very difficult to do in real life. But, 2 or 3 9s may be good enough for most people. If a J2EE servers make this possible for average teams to achieve at reasonable cost then I say it's a good thing. Again, the customer decides what they are willing to pay for.
As for the 'show me' statement. Most HA systems are really about limiting the interruption of service to a specified maximum interval when a fault occurs. They should also be able to work at a possibly reduced level of functionality when faults occur or simply delay the completion of pending business process until the failed system restarts, i.e. ensure correct behaviour in the presence of faults. Whats acceptable is obviously a parameter when the system is designed.
Ideally this interruption interval is 0 seconds, ie. continuous availability and the delay to the completion of a task should be zero also. In practise we may be happy with 10 seconds, a minute, 10 minutes etc for the interruption interval or the delay in completion.
For an example of such a system, look at the NYSE. Tandem recently ran an ad claiming that the NYSE hasn't had a single second of downtime in 10 years, how many nines is that? More than 5 I'd guess. Click here
to read about the system. Whats to stop Compaq building a J2EE layer on top of Tandem Non/Stop that uses the J2EE contracts and provides a Java environment to host Java J2EE applications at this level of service?
J2EE is only a bunch of interfaces. This is entirely possible. I'd agree that it's almost impossible for a generic Java J2EE server to implement this level of availability but on the right platform, why not? Sun need to add these interfaces so that a J2EE application can be written on a low end platform and later moved over to a platform such as a Tandem. It's all about making J2EE scalable in terms of platform support, from a single PC to a himalaya cluster. Thats scalable from the point of view of RAS requirements as opposed to performance.
I disagree with your statement and think it's more appropriate to say "0 seconds of service interruption and/or processing delay given a failure may be unrealistic depending on the components in the system".
Anyway, as usual it all depends but I don't see why them doing this is a bad thing.
Why it is bad ?
It is definitely not bad, they're not harming anyone but indirectly.
But shall we discuss a little more in depth ?
There are two things we might be talking about as you also pointed:
a) Quick recovery from failures (minimizing downtime).
Here databases and the transaction manager take all the responsibility, J2EE app servers has no business to do, they don't do transactions, don't maintain logs.
Even if they had some business to do (some imaginative guy might come up with examples), doing so with an API is ridiculous.
Do you speak different SQL with Oracle if you implement a standby database for quick recovery ? No, you don't.
b) Failover without disruption of service (like in some clusters). Here the things are a little bit more complicated.
One thing you should know, you can't do it with your database(s). You can do a little if your database is a parallel, shared disk architecture but the single point of failure is the disk.
Still the SQL is the same there's no API involved.
Now some app servers claim to support such a thing.
For instance they may store session in the database.
But that is just saying that app servers and JVMs are much less reliable than databases.
They do so at a terrible cost.
Still, what's the point in having an API for high availability ?
Making J2EE apps even more complicated, adding other 10 patterns to J2EE patterns site, and editing ten more books at Sun Press ?
As to what regards Tandem Himalya Servers, and NonStop SQL
the key is: good quality software, redundant good quality hardware.
Still NonStop SQL uses standard SQL.
And my thinking goes that it would really be a shame if Compaq will build a J2EE layer on top of NonStop SQL (like Oracle did ?).
I don't agree, you're over simplifying the situation by far. If only things were so simple then my job would certainly be a lot easier.
Accordingly to your post, the only thing that can fail is the database. I respectfully disagree. Servers hosting business logic are more likely to fail. You'll have one database but say 4 servers running business logic. The logic servers are more likely to fail, the law of numbers. These failures need to be masked. RMI/IIOP clustering (I'd prefer messaging but lets say RMI/IIOP) can help people make mask this failure and it has nothing to do with databases. The app server can rollback the transaction if there is one and the client can reissue the request on another node.
An API for failures is ridiculous?
I disagree. Look at Tibco RV 6.6. They have a fault tolerant API for building high end message processing clusters. This is very useful in the area I work in and it has nothing to do with a database. I'd like to see this feature implemented by message bean containers and am currently working on this.
RMI/IIOP clustering is another example of an API, I think it's of limited use but it's there if you want it.
Single point of failure in HA databases is the disk?
Again, I respectfully disagree, disks can be made redundant eliminating single or multiple points of failure depending on cost. Mirrors, each half of the mirror on seperate controllers, cables, even PCI buses. No rocket science, just common sense. Storage Servers are making this easier if you can afford them.
Servers storing state persistently?
Your points on the database being more reliable than the server is just not accurate. If a component has state then it needs to make it persistent to be recoverable when a failure occurs. WebLogic does this with memory based replication, WebSphere with a database. This has nothing to do with relative reliability of the server versus the database, they just need to make the state recoverable and eliminate a single point of failure.
It's a shame etc?
Again, I just don't agree. When I look at J2EE, I see it's interfaces as being the more useful piece. It's not about WebLogic or WebSphere.
These interfaces can make applications portable across platforms, giving companies for the first time, the ability to build reliable applications using commodity developers at low cost whilst retaining the flexibility to deploy those applications on a wide variety of platforms.
And that is why J2EE is useful and thats why it needs to be extended further to allow this to be achieved for a much broader class of applications.
I agree with Bill.
JSR 117 is good. I would like to see more things like that.
If you do not need it, you do not have to use it.
It does not intent to modify existing J2ee APIs.
No one asks you to pay for it.
Let's discuss that a little bit further, shall we ?
I didn't say that database is the only thing that can fail.
I said database is the only one who needs recovery.
App Servers, you just have to restart them, kill the process and launch again, or maybe they offer you a management console.
Database needs recovery in case of failures.
Some recoveries are automatic , some are manual.
Again you misquoted me wiht respect to databases single poitn of failure, I was talking about parallel shared disk architetcure.
Yes there is a single point of failure, there's not yet a totally redundant architecture as far as I know. You can miunimize the chances of such a fialure by the ways you described, but you can't rule that out.
But let's discuss the substance, should you use an API to make the system Highly Available ?
I don't know about Tibco RV6, I know about CORBA smart stubs (Visigenic, Iona).
Totally transparent to the programmer.
Highly Available databases, also developers don't write Hghly Available SQL.
It is an implementation matter to make these things happen.
State persistence, well, don't you write with the same SDK in WebLogic and Websphere ?
As about why it would be a shame to put an J2EE layer on top of NonStop SQL, I'll tell you why:
While the relational model has a solid theoretical foundation and has been validated time and time again in practice, it is not the case of J2EE.
More, NonStop SQL already has an interface - the SQL language.
You can buy an app server and let it stand separate and use the database services, but don't ruin the database.
So my point is do we need an High Availability API, or HA should be a totally orthogonal aspect left for the implementation?
I'm waiting comments on this, if you please.
You said "Highly Available databases, also developers don't write Hghly Available SQL. It is an implementation matter to make these things happen".
I agree 100% with you. I see no reason why this has to be an API. This does not mean however, that I am not a supporter of "Highly Available" application servers...but all I mean is this should not be a burden put on developers. A developer should not be writing against APIs to ensure their applications run well in HA environments. Actually isn't that the whole reason why J2EE is a better framework than CORBA, wherein services code had to be written?
I strongly emphasize the need for such HA contract in the J2EE specification though. But that contract should be a part of SPI and not API. Let container vendors do that job of implementing this as SPI and I (the mortal developer) would just declare whether I need this service or not in my deployment descriptor.
Application State may need recovery, it's not always practical to write stateless servers and that state may not be persisted to a database for various reasons.
What about messaging? Transactional message transports such as MQ series also need recovery.
By Parallel Disk architecture, you mean Oracle Parallel? or HA physical disk topologies. If you mean disk topologies then you can eliminate single points of failure, do you really think EMC could sell boxes for the sums they do if those boxes had single points of failure? For examples on building those topologies, look at the redbooks from IBM describing their storage servers physical architecture.
Smart Proxies in Corba are transparent to the user of the client stubs, not the guy who had to sit down and write the smart proxies. I don't see why server vendors can't provide me with implementations of such smart proxies for common scenarios. Not everyone is suited to writing such proxies. If J2EE needs to be extended to accomodate these enhancements, whats the big deal? That was always the problem with Corba, the skill set required to build the applications was too high. J2EE tries to lower that minimum skill level to a more manageable point. Time will tell whether they are successful or not.
Highly available databases.
I beg to differ, applications need to be written to handle database fail-over. This handling is outside the scope of SQL. So, must all applications reinvent the wheel to do this? J2EE database connection managers usually provide more manageable solutions for this. Couple message beans to this and you're probably got a fully automatic recoverable system with user written code only for business logic, no failure management code at all.
This isn't about databases and SQL. If SQL was all we needed then we'd all still be writing client/server applications. Look at Tuxedo and Encina for example to see their HA features. Encina provides you with a 'transaction' log that you can use to recover your state, it provides messaging, RPCs, security, transactions, logging, simple databases, load balancing, directory services. Starting to look like what the J2EE spec will evolve to, no?
Your last point is the most interesting. Sun are not specifying implementations, they are specifying interfaces. Some problems are not so standard and it's difficult to do a good set of interfaces for these but for others, a set of interfaces can be defined and I don't see this as a bad thing.
It is not always desirable to fall back on HACMP or Sun Cluster for HA. When higher level of availability are required and you can't afford to wait for a database to recover then if the J2EE vendors can provide solutions for some of these application types, I don't see whats wrong with that.
You're getting me totally confused.
Like this : "When higher level of availability are required and you can't afford to wait for a database to recover then if the J2EE vendors can provide solutions for some of these application types, I don't see whats wrong with that".
If you don't wait for the database to recover you're doomed.
Can you tell me what an app server or an application in general can do without waiting for the database to recover ?
I totally lack imagination on this subject, my thinking goes that you generate a screen
-- sorry, but I can't do the work right now, please try back in 15 min --
The smart programmers of the Smart CORBA Proxies don't have an API in common across vendors.
J2EE venodrs are free to provide all kind of solutions, as long as they are transparent to the programmer, but adding another API, and enlarging the already inflated specs it's not a good thing IMHO.
Just think about this:
"J2EE tries to lower that minimum skill level to a more manageable point".
Well, let's consider the following:
- J2EE BluePrints
- J2EE Patterns Site
- Vlad Matena's applying Ejb book
- all kinds of problems debated on this site.
So my thinking goes that for now , J2EE managed to make the development of server side aplications look like rocket science. Quite unnecessarily.
It seems to me that by adding more APIs is not going to help, you can bet on it.
I may have cached the database in memory for example and can therefore carry on running queries while the database restarts until I need to do an update which I queue to run asynchronously when the database is available. Caching and queueing the database modifications allows my application to carry on even when the database is down or restarting. Caching and messaging, two useful possible benefits of a J2EE product.
I never said that building distributed systems was easy but what I do say is that while you still need a thinker on the team, most of the team members don't need to be.
Whats obviously to me (and your-self possibly) will not be obvious to a lot of people. The J2EE interfaces allow functionality to be provided by a J2EE vendor that otherwise I would need to write and that many wouldn't have the experience or know how to write.
The interfaces allow me to choose at a late stage the container implementation that best suits my needs or even for a vendor to provide multiple containers implemented for different RAS levels whilst maintaining a common programming model. The container SPI that is currently being standardize should allow third parties to write containers for special applications. These containers can run on top of the basic J2EE infrastructured provided by a vendor (JTA, Security, JMX,JCA etc).
Interfaces are a good thing and interfaces that allow different implementations to be plugged in at will are an even better thing.
Well, if you have cached the database in memory, I assume that you mean only partial information is cached not the whole database.
Then, if you're going to queue updates, you're not going to be able to tell your user if the transactions succeeded or fail.
If your "user" is a virtual asynchronous client system that just posted a message in a queue, then yes, you can wait to retry updates with the database.
But if you user is a real person waiting for the form submittal to come back with results, you're doomed.
And this is just a natural functionality of message queues , it's not an enhanced programming model .
If you run the queries against the cache you're going to get incomplete results, more you're going to need a whole lot of duplicated functionality that already exists in the datbase engine.
You're not going to get that.
Please come back with a more realistic scenario if you will.
You might still be dreaming of pluggable containers that interoperate with different app server vendors, but I wouldn't be so sure it will come soon, if ever :)
Yes, interfaces are a good thing, but what I was talking about was separation of concerns which is a more general and a more important thing.
The HA features should therefore not be a part of the J2EE programming model.
You wanted a scenario where I could survive a database restart and I gave you one. Cache the whole database, not always practical but things like catalogs etc could be cached or replicated locally.
Lots of applications need to cache the database data for performance and/or availability reasons. They need to be able to run queries against this. Future EJB containers should be able to do this and past products such as IBM Component Broker did this.
You could use products like TimesTen to implement your local queryable cache with little work.
Caching is a fact of life for most applications, and it J2EE servers can provide functionality to help with this then fine by me.
Your comments on queueing etc, this happens every day. Order a book from Amazon, it says at the end that your order has been accepted, thats means queued. Whats wrong with that? Very acceptable and a synchronous client would accept this depending on the business use cases.
It looks realistic to me. It's a pattern applicable to almost any B2C site you could think of.
I think we'll just have to agree to disagree at this point.
I'm very happy to disagree on this one.