Java Development News:

Building Content Oriented Integration Solutions With Mule and JCR

By David Dossot

01 Jun 2008 | TheServerSide.com

The latest version of the JCR transport for Mule ESB offers a set of features that can enable the creation of content oriented integration solutions.

These features include:

  • Storing content in a repository
  • Listening to changes in a content repository
  • Reading content from a repository

This article presents the implementation of a simple scenario where all these features are leveraged. The following is a short description of the context of this example.

A company wants to automatize the scanning, storage, indexing and retrieval of a particular type of documents it receives. An automatized scanner creates a PNG file for each new document in a particular folder. Each document image has a unique file name. All of these documents have a barcode that identifies them in a unique manner. After storage in the content repository, it must be possible to retrieve a picture of a document using its barcode ID as a parameter in a simple HTTP GET request. The incoming flow of documents has a peak time in the morning: the system must be able to handle this peak of activity without requesting too much memory. The priority must be given to picking up the files that the scanner stores in the deposit folder and save them in the content repository. The barcode reading, which is a slow and memory consuming process, must be performed asynchronously.

To achieve these goals and satisfy to the different constraints expressed before, the following design will be implemented and detailed hereafter:

The document image must be streamed from the file system to the JCR container to reduce memory footprint.

The system will use the repository observation mechanism to be notified of newly stored images in order to read their barcode ID asynchronously.

A parameterized XPath JCR query will be used to retrieve the document image from its ID.

As you can see, other transports have been used alongside the JCR one, like the File transport used for listening to a folder and the HTTP transport used for exposing the document image retrieving service. This illustrates how you can leverage the architecture of Mule to gather, transform and route content to and from a content repository.

Storing content in a repository

The following fragment of Mule configuration shows how the standard Streaming Bridge Component can be used to stream the image from the file system to the content repository.

<mule-descriptor name="FileLoader"
  implementation="org.mule.components.simple.StreamingBridgeComponent">
  <inbound-router>
    <endpoint address="file://./inbox" streaming="true">
      <filter pattern="*.png" className="org.mule.providers.file.filters.FilenameWildcardFilter" />
    </endpoint>
  </inbound-router>

  <outbound-router>
    <router className="org.mule.routing.outbound.OutboundPassThroughRouter">
      <endpoint address="jcr://documents" streaming="true">
        <properties>
          <property name="alwaysCreate" value="true" />
          <property name="nodeRelPath" value="${originalFilename}" />

          <property name="nodeTypeName" value="my:document" />
          <property name="jcr:mimeType" value="image/png" />
        </properties>
      </endpoint>
    </router>
  </outbound-router>
</mule-descriptor>

Note that the originalFileName parameter value, used for the relative path of the target JCR node where the document will be saved, is set by the inbound file endpoint to be equal to the name of the file that has been picked up on the file system. Note also that, despite the fact we expect different file names for the incoming document scans, we force the creation of a new node even if one already exists under the same name.

Here is the definition of the my:document node type (i.e. the content structure where the document image and the barcode ID will be stored):

[my:document] > nt:unstructured

[my:document] > nt:unstructured
+ imageFile
- barcodeId   (STRING)

imageFile child node is of nt:resource type.

The JCR transport offers the possibility to enforce such a structure without the need to actually define it in the JCR container, an approach that is consistent with industry's best practices of content modeling. To achieve this, the application must register a custom implementation of NodeTypeHandler to the JCR connector. This interface defines the methods an application can use to create and update content, and also to reach the node type manager where all the handlers are registered. The following code shows the implementation for the my:document node type described before:

public class DocumentNodeTypeHandler implements NodeTypeHandler {

    private NodeTypeHandler fileNodeTypeHandler;

    public void initialize(NodeTypeHandlerManager manager) {
        fileNodeTypeHandler = manager.getNodeTypeHandler("nt:file");
    }

    public String getNodeTypeName() {
        return "my:document";
    }

    public Node createNode(Session session, Node targetNode, String nodeRelPath, UMOMessage message) 
      throws RepositoryException, IOException {
        Node documentNode = targetNode.addNode(nodeRelPath, "nt:unstructured");
        fileNodeTypeHandler.createNode(session, documentNode, "imageFile", message);
        documentNode.setProperty("barcodeId", "");
        return documentNode;
    }

    public void updateContent(Session session, Node targetNode, UMOMessage message) 
      throws RepositoryException, IOException {
        throw new UnsupportedOperationException("Once a document has been saved, its content can not be updated");
    }
}

Notice how the barcodeId property is created empty upon the parent node creation: this is to facilitate the storage of the actual ID when it will be available later on, as the update of an existing property is easier than creating a new one with the JCR transport. Note also that the handler does not allow subsequent modifications of a node (this is consistent with the specification that says that all files, hence all my:document nodes, will have different names).

Listening to changes in a content repository

The asynchronous update of the barcode ID is implemented with two components: one that listens to JCR events and one that reads an image input stream to find the barcode ID. The first one, which is a standard Pass-through Component as shown here after, has the following notable configuration aspects:

  • the inbound endpoint is configured with standard JCR observation parameters: it listens only to the addition of new nodes of type nt:file anywhere under the /documents path, allowing locally induced events.
  • it also uses a transport specific parameter that asks the listener to enrich the standard event object with content from the new node (in this case, we are interested in the UUID of the newly created node, so we can target it easily when writing the barcodeId property.
  • the outbound router performs a simple orchestration: it first sends the incoming event to the in memory queue consumed by the barcode reader component, then chains its result to an endpoint in charge of writing the ID in the relevant property.
  • the endpoint that takes care of writing the barcodeId selects the targeted property by navigating from the node pointed by the aforementioned UUID with a relative node path and a relative property path (these notions should be familiar to JCR users).
  • the nodeUUID parameter is set using an expression that is resolved at runtime, using a message property named documentFileNodeUUID that is set by the first component of the chain.
  <mule-descriptor name="NewDocumentEventListener" 
  implementation="org.mule.components.simple.PassThroughComponent">
   <inbound-router>
    <endpoint address="jcr://documents">
     <properties>
      <property name="eventTypes" value="1" />
      <property name="deep" value="true" />
      <property name="noLocal" value="false" />
      <list name="nodeTypeName">
       <entry value="nt:file" />
      </list>
      <property name="contentPayloadType" value="noBinary" />
     </properties>
    </endpoint>
   </inbound-router>
   <outbound-router>
    <router className="org.mule.routing.outbound.ChainingRouter">
     <endpoint address="vm://documentProcessorQueue" />
     <endpoint address="jcr:///">
      <properties>
       <property name="nodeUUID" value="${documentFileNodeUUID}" />
       <property name="nodeRelPath" value="../.." />
       <property name="propertyRelPath" value="barcodeId" />
      </properties>
     </endpoint>
    </router>
   </outbound-router>
  </mule-descriptor>

The component in charge of reading the document image file stream to find its barcode ID is shown hereafter. It is a simple scripted component written in Groovy: the comments in it should make it pretty straightforward to understand, at least for someone with some knowledge of Mule's dispatching mechanism. Because a JCR path can contain URI reserved characters ([ and ]), the JCR endpoint URI is built with a specific helper method that takes care of escaping them.

 

  <mule-descriptor name="DocumentProcessor" 
  implementation="org.mule.components.script.jsr223.ScriptComponent" 
  inboundEndpoint="vm://documentProcessorQueue">
   <properties>
    <property name="scriptEngineName" value="groovy" />
    <text-property name="scriptText">
        // a JCR observation is a collection of events: the inbound endpoint 
    configuration only produces one
        jcrMessage = message.payload[0]

        // store the UUID in the message context for later usage
        message.setProperty("documentFileNodeUUID", jcrMessage.uuid)

        // get the document stream from the data property of the file node
        newDocumentEndpointURI = 
    org.mule.providers.jcr.JcrEndpointBuilder.newJcrEndpointURI(jcrMessage.getPath())

        // the payload is a map of JCR property names and values
        imageStream = eventContext.receiveEvent(newDocumentEndpointURI, -1).payload['jcr:data']
        barcodeId = descriptor.properties['barcodeReader'].getBarcodeId(imageStream)
        log.info('New barcode ID: '+barcodeId)

        return barcodeId
    </text-property>
    <!-- Injects the barcode reader into the scripting component -->
    <container-property name="barcodeReader" reference="barcodeReader" required="true" />
   </properties>
  </mule-descriptor>

Here is an excerpt of the console log when a new document file has been stored in the repository, which shows how the first dispatcher connection retrieves the image stream from the path that was found in the JCR event. Then, after the barcode has been read (using Java4Less RBarcode Reader in this sample), how its ID is logged by the script component and is dispatched to another JCR endpoint that stores it in the relevant property.

10:51:30,227 INFO  [org.mule.providers.jcr.JcrMessageDispatcher] Connected: JcrMessageDispatcher{this=b8715e, 
  endpoint=jcr:///documents/letter-001.png/imageFile/jcr:content}
RBarcode Vision evaluation version
www.java4less.com
-------------------------------------------
WARNING! LIMITATIONS OF THE EVALUATION VERSION:
It will display this notice.
It will randomly replace some characters in the read value.
-------------------------------------------
Progress 0.0%
Progress 20.0%
Progress 20.0%
Progress 40.0%
Progress 40.0%
Progress 60.0%
Progress 60.0%
Progress 80.0%
Progress 80.0%
Progress 100.0%
10:51:33,128 INFO  [org.mule.components.script.jsr223.ScriptComponent] 
  New barcode ID: 202010123507123
10:51:33,134 INFO  [org.mule.providers.jcr.JcrMessageDispatcher] Connected: 
  JcrMessageDispatcher{this=ce81f5, endpoint=jcr:///}

Reading content from a repository

A global endpoint defining a parameterized JCR XPath query is used to read the document images back from the repository. Because the content of the jcr:data property is binary, the payload coming out of this endpoint will be a stream, ready to be sent back to the browser requesting the image. The id parameter will come directly from a parameter in the HTTP GET request.

  <endpoint name="DocumentImageQueryLookup" address="jcr:///" type="receiver">
   <properties>
    <property name="queryStatement" 
value="/jcr:root/documents/*[@barcodeId='${id}']/imageFile/jcr:content" />
    <property name="queryLanguage" value="xpath" />
    <property name="propertyRelPath" value="jcr:data" />
   </properties>
  </endpoint>

The following component hooks into the servlet container that hosts the application thanks to Mule's Servlet transport. Another Groovy script is used to retrieve the global endpoint (defined above) and receive from it the payload that must be used as the response to the HTTP request.

  <mule-descriptor name="DocumentServer" inboundEndpoint="servlet://document" 
  responseTransformer="PngContentTypeSetter UMOMessageToHttpResponse" 
  implementation="org.mule.components.script.jsr223.ScriptComponent">
   <properties>
    <property name="scriptEngineName" value="groovy" />
    <text-property name="scriptText">
       imageLookupEndpoint = managementContext.lookupEndpoint('DocumentImageQueryLookup')

       // rewrite the event to target a different endpoint
       org.mule.impl.RequestContext.setEvent(eventContext.session.createOutboundEvent(message, 
   imageLookupEndpoint, null))

       // return the document image stream
       return imageLookupEndpoint.receive(-1).payload
    </text-property>
   </properties>
  </mule-descriptor>

A chain of transformers is used to set the Content-Type header on the response and to transform the current UMOMessage into a valid HTTP response object. These transformers, which are standard ones, are configured as shown hereafter:

 <transformer name="UMOMessageToHttpResponse" 
  className="org.mule.providers.http.transformers.UMOMessageToHttpResponse" />

 <transformer name="PngContentTypeSetter" 
  className="org.mule.transformers.simple.MessagePropertiesTransformer">
  <properties>
   <map name="addProperties">
    <property name="Content-Type" value="image/png" />
   </map>
  </properties>
 </transformer>

It is then possible to retrieve an document image from a regular browser:

Conclusion

Though simplistic, this example shows how the JCR Transport can be leveraged to lay the foundations of a proper document management system. For instance, this example falls short in term of legal constraints that a full fledged DMS has to handle (using a legally binding file format like CCITT III instead of PNG, taking care of mandatory archiving and encryption requirements, authenticating users in the web interface...).

This said, the capacity to involve JCR-compatible content repositories into integration scenarios, combined with the solid architecture of Mule ESB and its impressive amount of available transports, opens up the possibility to create new kinds of ad hoc content-oriented solutions.

To sum it up, the JCR Transport for Mule offers a capable tool box to anyone looking to make the most of their content repository and turn it into a first class component of their IT landscape.

References

  • Mule ESB
  • JCR 1.0 (JSR 170)
  • Apache JackRabbit
  • Source code of this example

About the Author

David Dossot works for Riptown Media, a company that provides services to a major international digital entertainment group. David is the project despot of the Mule JCR transport and the project lead of NxBRE, a business rules engine for the .NET platform.