IBM has announced its plans to release the first product from Xperanto, an initiative to allow fetching information from many data sources at once (from sales records to documents stored in e-mail servers).
- Posted by: Dion Almaer
- Posted on: January 13 2003 07:36 EST
Microsoft and BEA are going down the same path, with Oracle arguing that fewer large databases are less expensive to maintain.
Although in the past, the performance for these tasks has been too slow, vendors now claim "improvements in querying technology over the past two years, coupled with faster hardware and networks, make federated data or enterprise information integration (EII) systems more credible"
Read the article here
What is the future path for talking to our data? Is the database tier going to get smarter again, or will we be doing the work in the application server tier?
- I hate to be the all time sceptic... by Karl Banke on January 14 2003 10:29 EST
- compare and contrast by Cameron Purdy on January 14 2003 23:03 EST
- Sounds simple - but is it really? by Jonathan Sharp on January 15 2003 07:46 EST
- Delos has a product which does this now by Colin Sampaleanu on January 15 2003 09:41 EST
- EII Offerings by Kam Cleary on February 06 2003 18:14 EST
...but having had a good look at similar products there are some striking limitations, that make them at least questionable.
The first is probably that they provide only read only access to a datasource. This isn't bad in itself except it provides a very different way for querying data then for updating data.
Then of course different datasources may have different concurreny models, which might cause problems with the data consistency you get, unfortunately without any way to notice it. And of course there are other nasty things like race conditions between different data sources etc.
Then there is the management point. Does the application reside locally, using native connections to the database - this might cause problems with firewalls etc. if you are in a large enterprise. Does it reside on or near the servers and connects, say using soap, between a client and a facade like server, there is a lot of XML passing around to do.
Finally there is performance - IBM claims they have done something about it - which is good, because this has been a very major issue in such scenarios. If they can perform a clean join over multiple databases this is something to convince me.
But if the underlying data changes relatively slowly and you only need read only access, these tools might offer some relief and might speed up development.
How does this compare with LiquidData from BEA?
What is the relationship to J2CA?
Coherence: Easily share live data across a cluster!
Looks like Xperanto is a competitor to Liquid Data.
I find the statements in the article oversimplify the idea of EII.
Having worked on a similar strategy a couple of years ago - "Consolidate legacy data in a single database and enable information access", it was quickly discovered that much of the complexity lay with integrating the legacy back-end applications that contain the actual data.
The main areas of complexity lay with the differing ways that legacy applications work. Some provide APIs for sync/async querying while others only produce file based data once a day. Obviously this creates a fairly major design problem for the schema in the consolidation database - which fields are immediately available and which are only updated daily? Even then, how many legacy applications can interface with XML natively, without some development work...if the answer to this is that we'll only interface with databases that can work with XML, then the actual breadth of the applications starts to look a little narrow in a real organisation
When you then consider the updates returning to the legacy applications, then life gets even more complicated. Do the updates go immediately, batched - does the legacy application process immediately or in a batched mode - is an immediate response required to the update.??
The statement in the article that it is only 10s of thousands to implement is simplistic - I think 10s of Millions is more like it for a large organisation.
When we totalled our estimated costs, the company decided that it would prefer for the operators of the call centers to have multiple user interfaces that gave them direct access to multiple back end systems and have them rekey the same data everywhere...!!
Therefore, in answer to this question, I would say that it would be nice if the database could get smarter and have smart connections with legacy systems databases to resolve the issues above. To do this though, the product needs to consider the real environment that is found in G5 organisations...ie Just saying "XML" and "Developer Studio" for implementation will not automatically resolve these complex issues.
" ... When we totalled our estimated costs, the company decided that it would prefer for the operators of the call centers to have multiple user interfaces that gave them direct access to multiple back end systems and have them rekey the same data everywhere...!! .. "
That is the problem with viewing everything as data. It is very difficult to separate the data from the "logic" (processing, etc.). Applications/Systems/(call it what you want) should provide access to the "data". They know what is going on. To bypass this (which most reporting systems/tools do) is just asking for trouble.
Legacy systems are a problem. Most were written with a specific database and view in mind and the business logic is typically tied to the view and/or the database. Unfortunately many new systems are being created the same way (even using OO languages). So we are left with doing what you have done (which can very easily end up with inconsistent data), doing some hacky screen scraping thing, writing wrappers for legacy apps (essentially duplicating logic) or re-writing apps in an OO way.
I cringe everytime I hear someone say "It is just a web app. I don't need to do all those things."
I would like to share my experience of an EII product by the name MetaMatrix. It actually does a federation across multiple datasources and also across flat files and XML data sources. What we are trying to build is adding different EIS e.g. SAP, Siebel to this list.
MetaMatrix provides a common unified virtual database to all consuming applications with JDBC as the API to access data. Only thing one needs to build is different adapters to different legacy systems that convert the SQL query into EIS specific query and get the data out of the EIS.
The product has some interesting query optimization techniques wherein one can pass volumetric hints to the query plan. Above all, its a Java/J2ee based product.
has a product called CMX which does a lot of the same things. This domain (two way transfer and consolidation of data from multiple datasources, some of which work in real time while others work in a batch fashion, with potentially duplicated data) is very complex. Delos has been doing it for a while...
Some good comments in this thread. Yes, EII is a highly complex problem by its nature. Good system support can help, but there's no way to completely remove the complexity of the process. Any serious EII effort will require serious resources.
My company has experimented with some of the offerings of start-ups in this area in the past, which have been disappointingly primitive for the most part. We've seen very poor caching support, little or no optimization, and some offerings that do not even handle out-of-core execution at the mid-tier (while being marketed as being capable of supporting data-warehousing-style queries). That said, IBM's offering is likely to blow alot of the previous offerings out of the water, due to their database expertise in general, and because they have done much of the pioneering research in this area over many years. I would expect MS, Oracle and probably BEA (Liquid Data) to have good stories here, eventually, as well.
In any case, IBM's offering will likely be the death knell for start-ups in this area, if any of them are still alive and kicking.