Content management problems need big-data solutions
By Cameron McKenzie
In the IT world, as with other highly complex fields like medicine, there’s always a temptation to overspecialize. Sure, plenty of vendors promise a holistic approach to software development and design, but even the big software companies that boast a broad portfolio of solutions have typically developed them one piece at a time. More to the point, the largest software corporations tend to grow by acquisition, gobbling up smaller competitors offering a best of breed product that services a particular niche. Despite the trend toward service-oriented architectures (SOA) that emphasize interoperability as a primary goal, higher-level functions, from a business perspective, often go missing when solutions are cobbled together from even the best independent pieces.
That’s why strategic integration has become such a critical concern for enterprise architects and IT management. The world’s largest companies are looking for solutions that can deliver on a massive scale across multiple concerns. It’s no longer enough to simply have a data specialist and a content management specialist assisting with the development and implementation of distinct systems. Enterprises that manage massive volumes of data need content management solutions that can deal with big-data in real time, and then put that data into systems that can ultimately deliver pertinently filtered data to users in a way that is readable, manageable, and time-efficient.
The content is the data
In speaking with TheServerSide.com, Harish Ramachandran, co-founder and project manager at CIGNEX Datamatix, explains how enterprises are increasingly challenged by the volume and variety of data that they must manage. As the content count increases, the backend systems and repositories, which were not originally architected for managing this amount of content, simply can’t keep up. Enterprises can’t use their existing system in its current form and expect to stay competitive.
Ramachandran points to one recent case where a global media conglomerate came to CIGNEX looking for a more effective content management solution. This organization had innumerable news feeds streaming in content from locations all over the globe, and every one of those millions of pieces of content needed to be fed through an editorial system in real time before making its way onto a live website or broadcaster’s desk. As you can imagine, with news now breaking minutes after it happens, efficiency in managing this massive amount of content was paramount.
The big-data big-picture
Viewing the needs of this type of customer from merely a content management perspective might allow the creation of a solution that would make content look good during its short duration on live display. However, the presentation layer is only a thin sliver of the entire problem. Looking only at the repository side wouldn’t be much better. While it’s still necessary to store news content, its value to a media company is much lower once the story is no longer live. Simply finding a less onerous, more scalable way to store data misses out on the true capabilities of big-data for adding business value.
It’s how content is funneled in through the front end all the way through to its representation in the repository structure at the back end that increases the performance of the system as a whole. A scalable content management system actually needs to perform all of the following operations on incoming data:
Fortunately, these are tasks that big-data solutions were architected to perform. With a more streamlined workflow, powered by a big-data solution, the editorial process and the end user experience are both improved, while the data repository can be more easily maintained and more readily searched. And with a big-data approach, there may also be opportunities to mine the content itself for garnering business intelligence and performing predictive analytics.
Solving the CMS problem with big-data solutions
The tools used for integrating big-data with content management portals vary according to the needs of each client. A NoSQL solution might be a good choice for documents and other media objects, while Hadoop might be the answer for unstructured data. The goal of CIGNEX in these situations is to apply technology patterns and paradigms in a way that respects the investment large organizations have already made. According to Harish, “We are all about enabling enterprises to adopt these technologies, coexisting with the architecture that these companies have. We aim to build our solutions so that they harmoniously coexist with their commercial and other backend systems.”
NoSQL Distilled By Martin Fowler
MongoDB: The Definitive Guide By Michael Dirolf
MongoDB in Action By Kyle Banker
Taming The Big Data Tidal Wave By Bill Franks
The Well-Grounded Java Developer By Martijn Verburg
09 Jan 2013