I propose this really simple (dont blame me ;-) data integration pattern, which i call "Lookup". It's just a basic sequence of select, insert operations.
<code>
class DataSource {
...
public Entity lookupEntity(Entity anEntity) {
Entity e = this.selectEntity(anEntity);
if (e == null) {
this.insertEntity(anEntity);
e = this.selectEntity(anEntity);
}
return e;
}
...
}
...
DataSource externalDS = new DataSource(...);
DataSource targetDS = new DataSource(...);
...
Entity extEntity = externalDS.nextEntity();
Entity targetEntity targetDS.lookupEntity(extEntity);
// process targetEntity
...
</code>
I found this useful in getting my logic steady and my code neat.
Hope you find it too.
/k
ps: suggestions/critics are welcome ;-)
-
A simple data integration pattern: Lookup (12 messages)
- Posted by: Federico K
- Posted on: July 22 2002 16:08 EDT
Threaded Messages (12)
- A simple data integration pattern: Lookup by Kostas Flokos on July 22 2002 18:22 EDT
- A simple data integration pattern: Lookup by Valery Silaev on July 23 2002 05:01 EDT
- A simple data integration pattern: Lookup by Federico K on July 23 2002 07:15 EDT
- EJB-Lookup; cleanup? by Axel Wienberg on July 24 2002 11:40 EDT
- A simple data integration pattern: Lookup by Valery Silaev on July 23 2002 05:01 EDT
- I don't get it. by David Plass on July 24 2002 13:30 EDT
- I don't get it. by Federico K on July 24 2002 15:20 EDT
-
Pattern is simple but not appropriate for bulk inserts by Ara Kassabian on July 25 2002 01:07 EDT
-
Pattern is simple but not appropriate for bulk inserts by RK INDIA on July 25 2002 10:59 EDT
- Pattern is simple but not appropriate for bulk inserts by Federico K on July 25 2002 12:42 EDT
-
Pattern is simple but not appropriate for bulk inserts by Fernand Rouwendaal on August 06 2002 02:54 EDT
- Pattern is simple but not appropriate for bulk inserts by Fernand Rouwendaal on August 07 2002 10:49 EDT
-
Pattern is simple but not appropriate for bulk inserts by RK INDIA on July 25 2002 10:59 EDT
-
Pattern is simple but not appropriate for bulk inserts by Ara Kassabian on July 25 2002 01:07 EDT
- I don't get it. by Federico K on July 24 2002 15:20 EDT
- A simple data integration pattern: Lookup by SRI TALLAVAJHALA on August 08 2002 21:14 EDT
-
A simple data integration pattern: Lookup[ Go to top ]
- Posted by: Kostas Flokos
- Posted on: July 22 2002 18:22 EDT
- in response to Federico K
It is simple and nice and used in one form or another by almost any developer.
But, we all have to be careful that this pattern is not thread-safe (another process may insert the object before the current one) and it does not guarantee database integrity (for the same reasons).
I think that this pattern may be used as such in environments where ACID characteristics are not a mandatory requirement and execution speed and time to market are more important.
To make it thread-safe, possibly a synchronized keyword would be enough (but on which object is the question???), but to make it transaction safe, I think it is enough to do the following:
public Entity lookupEntity(Entity anEntity) {
try {
this.insertEntity(anEntity);
} catch (Exception sqlException){
// ignore it, if primary key violation
}
return this.selectEntity(anEntity);
}
Obviously, that means we send one extra INSERT statement to the DB and we are penalized with exceptions thrown etc., while most of the time the entry is already there.
I would suggest then to keep the insert as a separate functionality implemented in another operation.
KF -
A simple data integration pattern: Lookup[ Go to top ]
- Posted by: Valery Silaev
- Posted on: July 23 2002 05:01 EDT
- in response to Kostas Flokos
<Kostas-Flokos>
To make it thread-safe, possibly a synchronized keyword would be enough (but on which object is the question
</Kostas-Flokos>
It's very common to hide such code behind transaction-aware facade like stateless session bean in J2EE instead of diving into developing yet another home-grown OLTP + O/R mapping framework.
So this pattern is nothing more but a variation of DAO + SSB facade -- "It is simple and nice and used in one form or another by almost any developer" ;-)
IMHO, JDO + SSB will be more natural and simpler to use (don't reply to this phrase, there is a special-purpose flame in "Discussion" section on this site already).
I admit that after Floyd's book on J2EE patterns was published, the "Patterns" section on theserverside.com degrade in some sense -- there are no any interesting ideas any more, just small variations of well-known ones. Possibly, because the book itself is really excellent ;-)
-
A simple data integration pattern: Lookup[ Go to top ]
- Posted by: Federico K
- Posted on: July 23 2002 19:15 EDT
- in response to Valery Silaev
<Valery-Silaev>
I admit that after Floyd's book on J2EE patterns was published, the "Patterns" section on theserverside.com degrade in some sense -- there are no any interesting ideas any more, just small variations of well-known ones
</Valery-Silaev>
<Federico-K>
I propose this really simple (dont blame me ;-)
</Federico-K>
This was intended to be a lightweight yet powerful implementation.
However I think "degrade", as you call it, is due to the growing number of people approaching design patterns strategies. Which means the book is reaching its target and the community is actively reacting to its illuminating content.
Next time I want to expose an incomplete idea I'll post it first in the EJB Design forum, trying to get community approval first ;-)
If someone want to remove my post I completely agree.
<Valery-Silaev>
Possibly, because the book itself is really excellent ;-)
</Valery-Silaev>
Of course it is.
/k
-
EJB-Lookup; cleanup?[ Go to top ]
- Posted by: Axel Wienberg
- Posted on: July 24 2002 11:40 EDT
- in response to Kostas Flokos
It's good practice not to program the hot path using exceptions, because the JVMs are not written to optimize that case. (Of course that may not matter much performance-wise when talking to a database across the network... but if you have a clever app server that does in-memory caching between transactions, it may)
If the transaction isolation level is "serializable", database theory says you get repeatable reads and are protected from phantoms, so if the row isn't there the first time you look, it will still not be there by the time you insert it. You have to do both inside one transaction, of course, and you may experience deadlocks or failed transactions when two transactions try to "lookup" the same row at the same time.
So in an EJB context, I would add a home method with transaction mode "requires new" which first searches for the desired entity and, if not found, tries to create it. When using a DB that has optimistic locks, wrap this method with another method that does a number of retries (compare the block sequence pattern), with transaction mode "supports". Transaction failures are now isolated to the inner method, so you can safely call the outer one from your business transactions.
A fundamental problem with this kind of "lookup" operation is that your data never shrinks! When do you purge the created rows? This again may endanger concurrent transactions. Ideas welcome...
Axel
P.S.: There's actually more behind this pattern... you could also call it "memoized functional computation", and see where that leads you...
-
I don't get it.[ Go to top ]
- Posted by: David Plass
- Posted on: July 24 2002 13:30 EDT
- in response to Federico K
I don't understand this pattern. If you already have the object, why do you have to look it up? If you didn't have the object in the first place (which is presumably why you're looking it up), how would you possibly have enough information to insert it? -
I don't get it.[ Go to top ]
- Posted by: Federico K
- Posted on: July 24 2002 15:20 EDT
- in response to David Plass
<David-Plass>
I don't understand this pattern. If you already have the object, why do you have to look it up? If you didn't have the object in the first place (which is presumably why you're looking it up), how would you possibly have enough information to insert it?
</David-Plass>
This pattern is to be used in what I tend to call "Data Integration". You have a sourceDS to get "external" entities from (eg: JavaBeans filled out of a web form) and a targetDS to store them to, if not already there. Finally you get "target" entities to work with.
<Federico-K>
I found this useful in getting my logic steady and my code neat.
</Federico-K>
Ok. This one is to be considered a somewhat "abstract" programming pattern not a design one. Is maybe clearer to get the idea behind it if I call it "pseudo-code snippet"?
Btw, let me remind you all smart guys the valuable KISS (keep it simple stupid ;-) principle. IMHO this is one of the rules we'd better never forget, I said *never*. Too many times we have seen over-architected/engineered solutions to simple problems, with obvious drawbacks, don't we?
As a rule of thumb, we tend to check both design and code understandability/simplicity every now and then, don't we?
Btw, keep on refining my humble hint, interesting advices are coming out of it ;-)
/k
-
Pattern is simple but not appropriate for bulk inserts[ Go to top ]
- Posted by: Ara Kassabian
- Posted on: July 25 2002 01:07 EDT
- in response to Federico K
The pattern is simple and may work (with the provisos already listed in previous posts) but it is not efficient.
In general, when loading new data into a target data source, you do not expect to find a lot of duplicates in the target data source. Doing a lookup before an insert results in two IO operations for every external entity (the lookup, then the insert). This is generally true even when there is caching because caches are finite and objects age out of them eventually. Thus, the posted pattern generally results in doubling the amount of work your server does when importing external data. The overhead is not too bad if you are loading only a small number of external entities. It becomes really bad when importing large numbers of external entities.
A better pattern is the following: If you do not expect a lot of duplicates, identify duplicates by exception; i.e., try the insert and trap exceptions. If you do expect a large number of duplicates, break down the import into two steps: (1) identify all duplicates using a single query; (2) filter out and insert the non-duplicates--the point is: use bulk IO operations instead of looking at the external entities one at a time.
If your app. server supports it, you can try using EJB-QL. If not, you will have to write custom code to query your target data source (presumably, this means writing JDBC calls). Even if your app. server supports EJB-QL, you may still have to write custom code because EJB-QL is fairly immature (compared to SQL).
A related patern is to insert the external entities into a temporary staging area and then filter out duplicates and move the "good" data using bulk methods (i.e., JDBC calls or similar).
-
Pattern is simple but not appropriate for bulk inserts[ Go to top ]
- Posted by: RK INDIA
- Posted on: July 25 2002 10:59 EDT
- in response to Ara Kassabian
<Ara-Kassabian>
A better pattern is the following: If you do not expect a lot of duplicates, identify duplicates by
exception; i.e., try the insert and trap exceptions. If you do expect a large number of duplicates, break
down the import into two steps:......
</Ara-Kassabian>
...well it sounds good at the same time, don't you think porting data to a staging area is again going to SLOW down
the overall effort?!
-RK INDIA -
Pattern is simple but not appropriate for bulk inserts[ Go to top ]
- Posted by: Federico K
- Posted on: July 25 2002 12:42 EDT
- in response to RK INDIA
<RATHAKRISHNAN-K>
...well it sounds good at the same time, don't you think porting data to a staging area is again going to SLOW down
the overall effort?!
</RATHAKRISHNAN-K>
Mumble, mumble... such an issue regards memory usage rather than speed performances.
As I'm constantly searching for lightweight yet powerful solutions, I'd suggest using some DBMS-dependent implementation (eg: Oracle stored prodcedures) hidden by a simple DAOFactory class (ok, maybe it's better to use JDO technology ;-).
A noticeble remark is that I'm talking about trying to optimize a programming pattern not a design one.
<Federico-K>
If someone wants to remove my post I completely agree.
</Federico-K>
If someone *has time* to *move* my post to a more appropriate forum I completely agree.
/k
-
Pattern is simple but not appropriate for bulk inserts[ Go to top ]
- Posted by: Fernand Rouwendaal
- Posted on: August 06 2002 02:54 EDT
- in response to Ara Kassabian
Relating to:
'If your app. server supports it, you can try using EJB-QL. If not, you will have to write custom code to query your target data source...'
-> Do you know: If there is a <query> defined in the generic DD (Deployment Descriptor), without having a <ejb-ql> clause. Do you know if it should be the case that the vendor specific deployment descriptor is then used by the container generating a proper sql clause (the JBOSS examples suggest so, but I cannot find back this search algorithm in the spec...) ? -
Pattern is simple but not appropriate for bulk inserts[ Go to top ]
- Posted by: Fernand Rouwendaal
- Posted on: August 07 2002 10:49 EDT
- in response to Fernand Rouwendaal
An empty ejb-ql tag is allowed. After some more studying the JBoss30 docs, here is a way how to override EJB-QL by vendor specific SQL (at least in case for JBoss - I assume this principle to be applicable to other vendors too...):
'The EJB-QL to SQL mapping can be overridden in the jbosscmp-jdbc.xml file. The finder or ejbSelect is still required to have an EJB-QL declaration in the ejb-jar.xml file, but the ejb-ql element can be left empty. Currently the SQL can be overridden with JBossQL, DynamicQL, DeclaredSQL or a BMP style custom ejbFind method.'
Just fyi ...
- Fernand. -
A simple data integration pattern: Lookup[ Go to top ]
- Posted by: SRI TALLAVAJHALA
- Posted on: August 08 2002 21:14 EDT
- in response to Federico K
Just a few thoughts...
This is a very common programming paradigm when using ISAM kind of data stores.
Basically, the ability of this technique to work is dependent on the environment. For example, if the underlying store has the ability to not insert duplicates no matter how many different client contexts may try it at once !
Naturally, the serialization of the transaction will play a role in this, and the question about synchronization (at what level) still remains.
Therefore, the ability of this "pattern" to be reusable in all such contexts is not clear.
This is a common problem, and the use of Hashtable.put/get/containsKey methods is already similar to this.