General J2EE: Data extration and replication with J2EE ?
Hi, J2EE-Experts !
- Posted by: Devon Miles
- Posted on: October 04 2005 08:03 EDT
I'm developing a web-based reporting platform with Java and J2EE technology. One objective of the solution is to generate reports based on the content of an existing propietary database (running on a MS SQL Server). To prevent the web-query functionality from pulling down the system's performance we would like to replicate in cyclic periods (approx. 2 minutes or so) the database changes of the production system to a database (MySQL) that serves only the reporting and query part.
Now my first idea was a cron-job (we are running Suse Linux Enterprise Edition) that starts a Standard Java-Application that replicates the "delta" that has been inserted into the source DB.
Is there any way or technique you would prefer ? Or is my approach reasonable ?
Thanks in advance
- Data extraction and replication with J2EE ? by Sohail Sikora on October 04 2005 14:55 EDT
What type of data is stored in the existing system? Do transactions have to be replicated? What type of reports do you need to generate?
You will have to use MS SQL Server for Reporting. This way, you can replicate the production database and use one instance for reporting and one for the application. I am not sure of any DB products that will replicate across different DBMS products (there maybe 3rd party tools that may do this).
If transaction replication is not needed, you are better off writing some kind of an Extract Transform Load (ETL) job to transfer data from a transaction processing DB to a reporting DB - on a schedule.
If the reports have to be run on the transaction data, for example printing of Invoices, shipping labels etc., you have no need to replicate. You have to go against the existing DB that is being used by the application.
If the reports contain aggregated data, then Approach 2 (in the absence of any other approach) is your best bet. With this approach, you will be able to use MySQL for your reporting needs.
First of all, thanks for answering Sohail !
Well, the data which this is all about is results of testtransactions which are used for monitoring our business applications concerning availability and performance.
I hope I understand the "replication of transaction" issue correctly. If you mean that I have to be careful that running transactions, that are the rolled back in the productive system should also be rolled back in the reporting system - that should not be the case.
I don't need all of the columns of the original data structure and I also could do some preliminary calculations to fill interim values (sums, averages, etc.) during the "replication" -process.
And the kind of reports: well basically, a website show the status of our applications during several periods of the past time. But I also have to keep the possibility of exports to pdf, xls or other well-known formats in mind.
I think so that there are 3rd Party tools dealing with "cross-database-platform-replication".
I was wondering if there is a "standard-J2EE" mechanism or component that is responsible for scheduled replication tasks.
So in the end - can I summarize that you would recommend a fine and small selfmade ETL programm that I schedule with cron on SuseLinux ? And there will be no bullets in my body from J2EE evangelists ? :-)
Thanks for bothering about all that !
You will always get a few bullets from the folks out there. However, your summary should be correct. A home made program scheduled using cron should be good. Also, as this does not contain any J2EE whatsoever, you have no reason to be worried about J2EE Evangelists ;-)
An easier solution would be to see if you can use Octopus. It is an open source ETL tool and can be found at http://octopus.objectweb.org/
I have never used it but it would be little or no risk as you can always write your own program if this does not help you.
I'll try that tool !