Keeping Track of Entity Data Changes Between Loads and Stores

Discussions

J2EE patterns: Keeping Track of Entity Data Changes Between Loads and Stores

  1. It is the job of an EJB container to automatically load and store Entity Data between each remote interface call. However, it is the job of the EJB developer to make sure that the data loaded in an Entity Bean is consistent with the backend data store at all times. If this is not the case, an Entity Bean can create data integrity problems. This situation may arise when throughout the life of an Entity Bean data, owned by the Bean, is being changed and these changes are not being reflected in the Bean's state.

    For example, an Entity Bean representing a parent record and a set of child records may have a remote interface method that adds a new child record. If the method's implementation immediately persists this record to the database, the Entity Bean automatically becomes out of synch with its data store until a fresh load is performed. However, if similar data operations continue without subsequent loads, the Bean can no longer be referenced for the most up-to-date information.

    This pattern offers a solution for these kinds of problems.

    This pattern primarily applies to the situation described in the example given above. The assumption is that an Entity Bean represents a parent record that has at least one set of related child records. Think about Order and Line Items relationship where Order is the parent record and it may have one or more Line Items associated with it. The relationships may be a lot more complicated and nest many levels, however this pattern would still apply once each individual relationship is considered. Thus, the discussion will focus on a single instance of parent/children relationship but it can be generalized for any situation where this type of relationship is involved.

    This pattern assumes that EJB developers correctly update parent record data during data setting operations targeted towards parent record data. If this is not the case, the Entity Bean code must be modified to update or set all the data after each data modification operation. It is also a good practice to follow the Aggregate Details Pattern to store the details of the Entity Bean. The examples in this discussion rely on the use of this pattern.

    In order to correctly capture the changes made to the child records, the EJB developers need to follow these steps:
    1. Update the Bean's child data
    2. Store the operation that was performed on the data
    3. Correctly apply the operation to the child data during ejbStore

    Steps 2 and 3 are by far the trickiest and therefore require a closer inspection.

    Let’s assume that we have a parent class named Shipment that can contain one or more schedules.

    public class ShipmentAccessor extends AccessorBase {
        // Class Data Members
        ...
        protected ChangeArrayList schedule = new ChangeArrayList();
        ...
    }

    ChangeArrayList is a special class derived from ArrayList whose sole responsibility is to keep track of data changes made to a piece of data stored in the list. It is a fully functioning class that has been tested and implemented in a large-scale project. If you would like to obtain a complete listing, drop me an e-mail at leo at stratos dot net.

    Following the Aggregate Details Pattern, the Shipment Entity Bean will be derived from the ShipmentAccessor class, thus allowing the complete graph to be returned by executing a single getAllDetails call from the Entity Bean and also enabling set/get code reuse inside the Bean.

    If we need to add a new schedule (represented by the ShipmentSchedule class) to the shipment currently stored in the ShipmentAccessor, all we need to do is to add it to the schedule list and do not immediately commit the changes to the data store.

    public void addShipmentSchedule(ShipmentSchedule shipmentSchedule) {
        schedule.add(shipmentSchedule);
    }

    The same is true for updating and removing schedules:

    public void updateShipmentSchedule(ShipmentSchedule shipmentSchedule) {
        schedule.update(shipmentSchedule);
    }

    public void deleteShipmentSchedule(ShipmentSchedule shipmentSchedule) {
        schedule.remove(shipmentSchedule);
    }

    By capturing the data and all of the operations performed on it, we keep the Entity Bean valid at all times. This, however, places more responsibility on the Bean's ejbStore method that now needs to apply all the data changes that were made after the last store. Therefore, in addition to the current code, extra functionality needs to be added to handle storing all the data modifications. Since all the changes are known, it is relatively straightforward to apply them. The example below uses ChangeArrayList to show how this can be accomplished.

    public void ejbStore()
    {
            ...

            // Store all the schedule records
            deleteShipmentSchedules( schedule.getDeleted() );
            insertShipmentSchedules( schedule.getInserted() );
            updateShipmentSchedules( schedule.getUpdated() );
    }

    Each of the methods that takes care of an individual data store operation (insert, update, delete) receives an iterator that iterates over objects that only have the specified operation applied against them. This way, each method needs to handle only a specific operation thus making the code very simple to write, understand and reuse. A consequent load should completely refresh the set of data (schedules in this case) and remove all of the operations previously performed against the data because this pattern only works for load-store lifecycle of the Entity Bean.

    Notice that a general usage pattern of the details object under the Aggregate Details Pattern as well as performance considerations (since each remote interface call invokes load/store operations) dictate the following use of the details objects:
    1. Find/create the bean
    2. Extract the details object (getAllDetails)
    3. Perform various data operations on the object
    4. Store the details in the Entity Bean (setAllDetails)

    Under this scenario, this pattern makes even more sense because of its ability to capture data changes made between each store operation. Let's imagine a situation where an EJB developer follows the general steps outlined above. However, in Step 3, s/he performs several data operations that involve updating existing data and inserting a new record (all this, of course, is done for a set of child records rather than the parent record). If special precautions are not taken, ejbStore will not be able to determine which child records were updated and which were inserted. This pattern allows EJB developers to seamlessly manage this information and to easily manipulate data no matter what operations have been performed against it.


  2. Leo, The pattern U described, looks really good.
    I have a concern about the pattern. You said about implementing the EJBStore method properly to store all the changes properly. But when ever a business method(addShipmentSchedule) is invoked on the EntityBean, The EJBLoad and EJBStore are automatically called if a new a transaction is started. If the Transaction attribute on the EntityBean Methods is set to TRANS_REQUIRED, Then all the business methods on the entity Bean should be called from a single Session Bean Method.
    But calling a single Session Bean Method to Update all the info will not suit a situation where I have Multiple UI screens that encapsulates a single Entity Bean. Here I need to write all the changes to the Entity Bean after the user has gone through all the screens Deleting, Inserting and Updating various elements of the entity Bean.

    My basic question is that EjbLoad and Store all called on every method if they are called individually.
  3. I will try to answer some of the questions raised in the replies posted above.

    There are a couple of ways that this pattern can be implemented in real life:

    1. As stated in the pattern, all the changes are made to the accessor object that represents the complete data vector of the Entity Bean. The accessor is obtained at the beginning of the transaction/lifecycle by calling getAllDetails() method and is modified throughout its existence. At the end, the accessor is persisted to the backend data store by calling setAllDetails() method of the Bean, thus forcing ejbStore to be called. Between getAllDetails() and setAllDetails() calls, the Bean itself is not modified -- just the accessor. Keep in mind that it is very costly to commit small changes to the Entity Bean one by one, which trigers store/load operations every time. It is much more efficient to modify the accessor's state and commit all of the changes at once.

    2. As suggested by Uday, a Session Bean can be created to wrap an Entity Bean. In this situation, the approach described above can be utilized but on a slightly smaller scale since the changes will be accumulated for one Session Bean method call only. Here, the pattern still proves useful since it eliminates the need to develop extra functionality to commit Bean's state changes (data updates, deletes, inserts) that would otherwise have to be implemented.

    3. Some EJB containers may provide special tags in their ejb-jar.xml files that enable developers to limit the amount of loads and stores. In WebLogic, for example, two tags -- <is-modified-method-name> and <is-db-shared> -- can be added inside the <weblogic-enterprise-bean> tag in weblogic-ejb-jar.xml to describe the situations when stores and loads should and should not happen. It is often a good idea to keep track of Bean's changes and signal to the container when the bean is changed and should be saved (this is actually what Toplink does for you, but for those of us without it, we have to reinvent the wheel). In this situation, since stores and loads may not happen on every remote interface call, this pattern enforces data integrity and consistency.

    Toplink is a great tool. Unfortunately, not all of us have the benefit of working with it. Thus, this pattern offers a solution for the situations when such advanced middleware tools are not available for the developers. Also, the cost of implementing this pattern is minimal if the framework for data tracking is already implemented. This is exactly what ChangeArrayList does. As you can see from the pattern description, keeping track of data using it becomes trivial.

    This pattern describes a solution for keeping the data consistent as a part of normal EJB operation. It relies on the assumption that a number of data operations is performed on the Bean's accessor between getAllDetails() and setAllDetails() calls. It is inefficient to use this pattern for keeping track of a single change that immediately gets persisted to the data store.

    This pattern has no bearing on the data modified by external processes since the Bean will not be aware of them at the ejbStore time. In fact, this is a very delicate problem that may require a separate dicussion.
  4. Has anyone compared the performance of TopLink to these methods? TopLink ultimately uses CMP? How are the issues about fine-grained entity beans avoided? Or are they just cached efficiently?
    I have heard that using TopLink adds up to 20% degradation in performance over BMP. Does that sound right?
  5. It will be much cheaper to by a product like toplink that can detect deletions, insertions and modifications. Your alternative of coding this stuff in pretty much every 1 to many bean if way more expensive that a license cost.

    Not affliated with toplink. Just think that it makes coding persistent stuff trivial.

  6. Hi Leo,

    I thought your pattern is describing how you would overcome the problem of data inconsistencies caused by an external process modifying the information on the database tables directly without going through the EJB container. Eg a nightly batch process.

    Do you encounter such problems? How do you resolve it?

    Ben
  7. Hi
         Keeping track of Entity Data changes is the responsibility of Transaction manager. Remember that transaction manager controls one Thread at a time. if suppose any batch load/update the Tx tell's the container that load the data again. By Context object this is getting acheived.

    Correct me if i am wrong.