News: Content management. It’s easy to use incorrectly.

  1. Content management.  You're Doing it Wrong - By Joe Ottinger

    Content management. It’s easy to use incorrectly, so it’d be a good idea to reconsider how it’s best used, in such a way that it saves people from work instead of adding to it.

    The stories aren’t hard to find: a CMS is proclaimed as a silver bullet in a land of werewolves, and management tells developers to use it. However, resources on using content management properly are hard to find, so naturally, they use the CMS as if it were a hammer, and the application gets shattered and updates become rare and ineffective.

    What a CMS can, or should, do

    A CMS provides a certain number of benefits, based on a few core concepts:

    1)      Schema-less data (meaning flexible data models)

    2)      Versioned data (meaning a history of some data)

    3)      Hierarchical data structures

    4)      Typeless data

    A fully functional CMS would almost certainly have to have search capabilities on top of these.

    The actual data structures underlying these features are irrelevant; you could use a tuple space[1], a specialized content repository[2], an object database[3], a relational database[4], or even XML.

    XML deserves some special attention here, for two reasons.

    One is a strength: XML, as a hierarchical document structure, mirrors almost exactly what a CMS should present to an application layer, and XPath (and its cousin, XQuery) is a natural and efficient mechanism for retrieving and specifying data held in a hierarchical store.

    However, XML isn’t a very good general content storage mechanism, because of the “typeless data” requirement most applications have. XML is an excellent representation of a hierarchical store, but it’s also designed to represent everything in text form, which means that storing an image in raw form is out. You’d have to alter the image store format to base64 or some other text-compatible form to work properly, and then your generalized searches would get mangled.[5]

    Using a CMS efficiently

    The way to use a CMS efficiently is to really use it to store versioned data, including as much of the content as possible.

    Some content doesn’t lend itself to content management: a log of events, for example, pleads for a relationally ordered system. Information that will end up in a tabulated report somewhere generally wants to live in a relational system, since databases are ideal for regurgitating static data.[6]

    However, after that… if you have a content management system in place, you’re going to benefit from thinking of all of your presented data as content. That means your page layouts, your transformation elements, almost everything but your actual processes can and should live in your content management system.

    Consider this kind of web page, as a DOM:


    o Sponsored Section

    o Header

    § Topic Bar

    § User Panel (user login, log out)

    § Search Panel

    § Page Title

    § Social interactions (add content, etc)

    o Content

    § Content Body

    § Comment block

    o Sponsored Section

    o Footer

    Putting aside what site you might be reading right this moment that this structure might represent, consider this: this structure might be changed. The workflow can be altered; adding content might lead to any number of possibilities (with their own flows).

    Therefore, not only can all of those elements be represented in a content management store almost directly[7], but the structure can be in the content management system, as can the workflow, and images, and any ancillary content elements[8] your application needs or contains.

    So when you discover that your application is “suboptimal,” shall we say – other terms might include “broken” or “wrong” or “dumb” – when you’re using a content management system well, you’re actually in a good place to fix it. You can alter an application based on content management fairly easily, because you don’t have a deployment step so much as a publication step.

    You would copy the relevant content structures to a development repository (an analogue to git clone or svn checkout, or perhaps the creation of a local branch), alter the content, test, and then publish those changes to the “main tree.” Since the application presentation layer is dynamic in nature and pulled from the content repository, the application can then change as the repository changes.

    Believe it or not, this is actually already done: Apache Sling uses this kind of concept already and successfully. It’s not difficult, and it’s actually very flexible; major changes to the presentation layer can be made with little work.

    That should make you, the developer, happy – because it gives you time to focus on things more interesting to you. Your productivity goes up because you’re not wasting your time altering a presentation layer.

    That should make your architects happy because it locates data where it makes the most sense for it to be located, in ways that are easy to approach.

    It should make your operations support team happy, because “application deployments” aren’t necessary for presentation-level changes; deployments can be focused on actually publishing fundamental changes instead of, well, just adding the ability to display Java source code properly or something.

    It should make your management team happy because even radical changes (from the users’ perspective) can be made without significant investment of time and money.

    Everyone wins, except perhaps some miscreant developers who want to create more work for themselves to create the illusion that they’re more valuable than they actually are.[9]

    It’s still possible to overuse a content management system, but approached wisely, most of the dangers are easily mitigated, because they’ll be obvious (or well-known), especially if you avoid seeing a content management system as people see relational databases today: as a hammer to which everything looks like a nail.

    You’re better off being willing to use multiple datastores (i.e., relational databases for logged events or specifically ordered data, and a content management system for any and all hierarchical data, with the relational database pointing to the content management system in places) than you are by pretending a single datastore is perfect.

    A relational database will never be as good as a content management system for schemaless or hierarchical data; a content management system will never be as good as a relational database for ordering simple data.

    The last thing to keep in mind: all of this is really pretty easy. If it’s not easy or efficient the way you’re doing it now, just remind yourself: it is easy, and you’re probably doing it wrong.

    The good news is that the right way to do it is really simple, and migrating to it shouldn’t be that hard.

    [1] Tuple spaces are my favorite idea, of course, since I work for GigaSpaces, a company whose application platform capabilities center around tuple spaces…

    [2] Jackrabbit, for example, or a product built on it, such as Magnolia, Alfresco, or CRX.

    [3] Two possibilities would be DB4O and Perst.

    [4] I don’t think I can really accurately enumerate the set of relational databases.

    [5] Imagine a search for “beef” in an XML document that stored images as well as text; 0xdeadbeef, a perfectly valid hexadecimal value, would match “beef.”  While you could always restrict the query to nodes matching a certain structure – and you’d probably want to, really – it highlights the sort of mangling problems that storing binary data in a text-oriented format would incur.

    [6] Just in case you can’t tell: I tolerate relational databases, but I’m not overly fond of them. To me, they represent a fantastic waste of time for most applications, although they are fantastic at generating report data.

    [7] A natural data hierarchy should be easy to deduce: for me, I see /sponsored/ads, as well as /topology/topic1 with references to /content/{specific node here}. However, all of this is beyond the scope of this article, and goes into the nuts and bolts of actually using a content store, instead of addressing the overall architectural reliance on a content store.

    [8] Think “javascript” or “cascading stylesheets,” for example, although you don’t have to stop there.

    [9] This is the “Dune” principle: Frank Herbert had Paul Atriedes point out that he who has the ability to destroy something valuable has control of that thing. In this case: developers are inflating their value over the enterprise by making sure they have the ability to destroy it through incompetence. Don’t be these people.

    Edited by: Cameron McKenzie on Mar 8, 2011 1:12 PM
  2. JeaseCMS (http://jease.org) is an example for a CMS which is built on top of the popular object-databases db4o and Perst and combines the power of static typing with schema-less data definition:

    Object-databases are a very good match for creating and persisting the hierarchical (statically typed) fundament for structured content types (type-safety, performance). This fundament can be extended with dynamic properties at runtime (per instance or via factory-based prototyping).

  3. Document Viewer in CMS[ Go to top ]

    It looks to me the usual document format such as Word, PDF, etc. displayed in these systems are usually convertered to html -- the formatting can be quite different or even lose some information. Some approaches like http://www.elookinto.com seem to make some progress, but still not quite there yet (either needs plug-ins or  some flash conversation).


  4. OpenCms (http://www.opencms.org/en/) allows the creation of XSD-defined XML structured content that is Lucene indexed and searchable by any field. The files are stored in a relational database and can be accessed via a well structured API, thus providing a "Virtual File System" to the developer. (http://www.opencms.org/en/support/features/features6/editxmlcontent.html). It also allows the upload, store and indexing of PDFs, Office and OpenOffice documents.