Discussions

J2EE patterns: A simple data integration pattern: Lookup

  1. A simple data integration pattern: Lookup (12 messages)

    I propose this really simple (dont blame me ;-) data integration pattern, which i call "Lookup". It's just a basic sequence of select, insert operations.

    <code>

    class DataSource {
      ...
      public Entity lookupEntity(Entity anEntity) {
        Entity e = this.selectEntity(anEntity);
        if (e == null) {
          this.insertEntity(anEntity);
          e = this.selectEntity(anEntity);
        }
        return e;
      }
      ...
    }

    ...

    DataSource externalDS = new DataSource(...);
    DataSource targetDS = new DataSource(...);

    ...

    Entity extEntity = externalDS.nextEntity();
    Entity targetEntity targetDS.lookupEntity(extEntity);

    // process targetEntity

    ...

    </code>




    I found this useful in getting my logic steady and my code neat.
    Hope you find it too.

    /k

    ps: suggestions/critics are welcome ;-)
  2. It is simple and nice and used in one form or another by almost any developer.
    But, we all have to be careful that this pattern is not thread-safe (another process may insert the object before the current one) and it does not guarantee database integrity (for the same reasons).
    I think that this pattern may be used as such in environments where ACID characteristics are not a mandatory requirement and execution speed and time to market are more important.
    To make it thread-safe, possibly a synchronized keyword would be enough (but on which object is the question???), but to make it transaction safe, I think it is enough to do the following:

      public Entity lookupEntity(Entity anEntity) {
        try {
          this.insertEntity(anEntity);
        } catch (Exception sqlException){
          // ignore it, if primary key violation
        }
        return this.selectEntity(anEntity);
      }

    Obviously, that means we send one extra INSERT statement to the DB and we are penalized with exceptions thrown etc., while most of the time the entry is already there.
    I would suggest then to keep the insert as a separate functionality implemented in another operation.

    KF
  3. <Kostas-Flokos>
    To make it thread-safe, possibly a synchronized keyword would be enough (but on which object is the question
    </Kostas-Flokos>
    It's very common to hide such code behind transaction-aware facade like stateless session bean in J2EE instead of diving into developing yet another home-grown OLTP + O/R mapping framework.
    So this pattern is nothing more but a variation of DAO + SSB facade -- "It is simple and nice and used in one form or another by almost any developer" ;-)
    IMHO, JDO + SSB will be more natural and simpler to use (don't reply to this phrase, there is a special-purpose flame in "Discussion" section on this site already).

    I admit that after Floyd's book on J2EE patterns was published, the "Patterns" section on theserverside.com degrade in some sense -- there are no any interesting ideas any more, just small variations of well-known ones. Possibly, because the book itself is really excellent ;-)
  4. <Valery-Silaev>
    I admit that after Floyd's book on J2EE patterns was published, the "Patterns" section on theserverside.com degrade in some sense -- there are no any interesting ideas any more, just small variations of well-known ones
    </Valery-Silaev>

    <Federico-K>
    I propose this really simple (dont blame me ;-)
    </Federico-K>

    This was intended to be a lightweight yet powerful implementation.

    However I think "degrade", as you call it, is due to the growing number of people approaching design patterns strategies. Which means the book is reaching its target and the community is actively reacting to its illuminating content.
    Next time I want to expose an incomplete idea I'll post it first in the EJB Design forum, trying to get community approval first ;-)
    If someone want to remove my post I completely agree.

    <Valery-Silaev>
    Possibly, because the book itself is really excellent ;-)
    </Valery-Silaev>

    Of course it is.

    /k
  5. EJB-Lookup; cleanup?[ Go to top ]

    It's good practice not to program the hot path using exceptions, because the JVMs are not written to optimize that case. (Of course that may not matter much performance-wise when talking to a database across the network... but if you have a clever app server that does in-memory caching between transactions, it may)

    If the transaction isolation level is "serializable", database theory says you get repeatable reads and are protected from phantoms, so if the row isn't there the first time you look, it will still not be there by the time you insert it. You have to do both inside one transaction, of course, and you may experience deadlocks or failed transactions when two transactions try to "lookup" the same row at the same time.

    So in an EJB context, I would add a home method with transaction mode "requires new" which first searches for the desired entity and, if not found, tries to create it. When using a DB that has optimistic locks, wrap this method with another method that does a number of retries (compare the block sequence pattern), with transaction mode "supports". Transaction failures are now isolated to the inner method, so you can safely call the outer one from your business transactions.

    A fundamental problem with this kind of "lookup" operation is that your data never shrinks! When do you purge the created rows? This again may endanger concurrent transactions. Ideas welcome...

    Axel

    P.S.: There's actually more behind this pattern... you could also call it "memoized functional computation", and see where that leads you...
  6. I don't get it.[ Go to top ]

    I don't understand this pattern. If you already have the object, why do you have to look it up? If you didn't have the object in the first place (which is presumably why you're looking it up), how would you possibly have enough information to insert it?
  7. I don't get it.[ Go to top ]

    <David-Plass>
    I don't understand this pattern. If you already have the object, why do you have to look it up? If you didn't have the object in the first place (which is presumably why you're looking it up), how would you possibly have enough information to insert it?
    </David-Plass>

    This pattern is to be used in what I tend to call "Data Integration". You have a sourceDS to get "external" entities from (eg: JavaBeans filled out of a web form) and a targetDS to store them to, if not already there. Finally you get "target" entities to work with.

    <Federico-K>
    I found this useful in getting my logic steady and my code neat.
    </Federico-K>

    Ok. This one is to be considered a somewhat "abstract" programming pattern not a design one. Is maybe clearer to get the idea behind it if I call it "pseudo-code snippet"?

    Btw, let me remind you all smart guys the valuable KISS (keep it simple stupid ;-) principle. IMHO this is one of the rules we'd better never forget, I said *never*. Too many times we have seen over-architected/engineered solutions to simple problems, with obvious drawbacks, don't we?

    As a rule of thumb, we tend to check both design and code understandability/simplicity every now and then, don't we?

    Btw, keep on refining my humble hint, interesting advices are coming out of it ;-)

    /k
  8. The pattern is simple and may work (with the provisos already listed in previous posts) but it is not efficient.

    In general, when loading new data into a target data source, you do not expect to find a lot of duplicates in the target data source. Doing a lookup before an insert results in two IO operations for every external entity (the lookup, then the insert). This is generally true even when there is caching because caches are finite and objects age out of them eventually. Thus, the posted pattern generally results in doubling the amount of work your server does when importing external data. The overhead is not too bad if you are loading only a small number of external entities. It becomes really bad when importing large numbers of external entities.

    A better pattern is the following: If you do not expect a lot of duplicates, identify duplicates by exception; i.e., try the insert and trap exceptions. If you do expect a large number of duplicates, break down the import into two steps: (1) identify all duplicates using a single query; (2) filter out and insert the non-duplicates--the point is: use bulk IO operations instead of looking at the external entities one at a time.

    If your app. server supports it, you can try using EJB-QL. If not, you will have to write custom code to query your target data source (presumably, this means writing JDBC calls). Even if your app. server supports EJB-QL, you may still have to write custom code because EJB-QL is fairly immature (compared to SQL).

    A related patern is to insert the external entities into a temporary staging area and then filter out duplicates and move the "good" data using bulk methods (i.e., JDBC calls or similar).
  9. <Ara-Kassabian>
                      A better pattern is the following: If you do not expect a lot of duplicates, identify duplicates by
                      exception; i.e., try the insert and trap exceptions. If you do expect a large number of duplicates, break
                      down the import into two steps:......
    </Ara-Kassabian>

    ...well it sounds good at the same time, don't you think porting data to a staging area is again going to SLOW down
    the overall effort?!

    -RK INDIA
  10. <RATHAKRISHNAN-K>
    ...well it sounds good at the same time, don't you think porting data to a staging area is again going to SLOW down
    the overall effort?!
    </RATHAKRISHNAN-K>

    Mumble, mumble... such an issue regards memory usage rather than speed performances.

    As I'm constantly searching for lightweight yet powerful solutions, I'd suggest using some DBMS-dependent implementation (eg: Oracle stored prodcedures) hidden by a simple DAOFactory class (ok, maybe it's better to use JDO technology ;-).

    A noticeble remark is that I'm talking about trying to optimize a programming pattern not a design one.

    <Federico-K>
    If someone wants to remove my post I completely agree.
    </Federico-K>

    If someone *has time* to *move* my post to a more appropriate forum I completely agree.

    /k
  11. Relating to:

    'If your app. server supports it, you can try using EJB-QL. If not, you will have to write custom code to query your target data source...'

    -> Do you know: If there is a <query> defined in the generic DD (Deployment Descriptor), without having a <ejb-ql> clause. Do you know if it should be the case that the vendor specific deployment descriptor is then used by the container generating a proper sql clause (the JBOSS examples suggest so, but I cannot find back this search algorithm in the spec...) ?
  12. An empty ejb-ql tag is allowed. After some more studying the JBoss30 docs, here is a way how to override EJB-QL by vendor specific SQL (at least in case for JBoss - I assume this principle to be applicable to other vendors too...):

    'The EJB-QL to SQL mapping can be overridden in the jbosscmp-jdbc.xml file. The finder or ejbSelect is still required to have an EJB-QL declaration in the ejb-jar.xml file, but the ejb-ql element can be left empty. Currently the SQL can be overridden with JBossQL, DynamicQL, DeclaredSQL or a BMP style custom ejbFind method.'

    Just fyi ...

      - Fernand.
  13. Just a few thoughts...

    This is a very common programming paradigm when using ISAM kind of data stores.

    Basically, the ability of this technique to work is dependent on the environment. For example, if the underlying store has the ability to not insert duplicates no matter how many different client contexts may try it at once !

    Naturally, the serialization of the transaction will play a role in this, and the question about synchronization (at what level) still remains.

    Therefore, the ability of this "pattern" to be reusable in all such contexts is not clear.

    This is a common problem, and the use of Hashtable.put/get/containsKey methods is already similar to this.