Integrate or disintegrate: How to keep your big data strategy from falling apart
By Jason Tee
Is your big data strategy about to fall apart? It's time to pull it together. Enterprises are just getting a handle on how to integrate ERP (enterprise resource planning) and other business applications to dismantle the silos that create inefficiency in business processes. Service Oriented Architectures, Software as a Service, cloud computing and other modern solutions have all played a role in helping enterprises achieve greater application integration. But today, organizations are facing a new set of challenges with the high volume of data flooding in. To be clear, this isn't a river of data. It's actually many separate streams that leave data just as disconnected or siloed as the enterprise applications of old.
Internal enterprise data in SQL data stores is often critical for interpreting both the veracity and relevance of big data from other sources.
Jason Tee, Enterprise Software Architect
This is not business as usual
Much of this data is nothing like the enterprise data that businesses are used to handling. With large scale structured data, most of the challenges with data proliferation could be resolved by addressing scalability, redundancy and analytics. With big data, those are just a few of the problems that enterprises must solve. The types of data collected today come from a much broader array of sources. Data from embedded sensors, RFID chips, audio and video feeds, document and image files, graphs, and much more come through the database doors. Social media is blowing away all preconceived notions about what data should look like. That's not even counting big data shared among business partners.
Organizations can no longer readily dictate or constrain the exact format in which data is presented. In fact, attempts to do so would substantially decrease the value of the data itself. An enterprise can only anticipate a certain number of potential scenarios or responses. No matter how many checkboxes or data fields they create, there will always be data that spills over outside the box. The outcome of ignoring everything that doesn't look like traditional data could be devastating from a competitive standpoint. The recent McKinsey Global Institute study, Big Data: The next frontier for innovation, competition, and productivity, suggests that enterprises are leaving hundreds of billions of dollars on the table by failing to fully leverage their currently available data.
Relational databases are only partial solutions
The burgeoning volume and variety of data is why tools and technologies for managing unstructured data have become so important. These non-relational NoSQL, XML and key/value data stores assist enterprises in resolving both scalability and accessibility issues for much of their big data. Solutions like Hadoop using MapReduce coupled with the Hive Query Language offer enterprises a starting point to manage their big data and gain business intelligence. Other major NoSQL database management systems such as MongoDB and Cassandra already offer integration with Hadoop, making it easier for customers to at least have an interface or overlay that connects disparate data streams.
The data itself is now more mobile within the enterprise as well. Parallel processing and intelligent data chunking tools like JitterBit are designed to permit the flow of data from one application to the next while maintaining quality. Such integration across data types and applications is key for time-sensitive activities involving real-time analysis. Often, this form of analysis must query both current and historical data to identify emerging trends. This is where SQL often comes back into play.
SQL, NoSQL and big data technology
The new data coming in does not negate the value of the carefully tailored business data that has been collected and generated over the past several decades. Internal enterprise data in SQL data stores is often critical for interpreting both the veracity and relevance of big data from other sources. Many organizations find they still need to maintain a SQL structure for their enterprise data to support their own best business practices. Pushing everything into a non-structured format isn't integration, it is just homogenization. At the same time, trying to force structure onto all unstructured data is likely a wasted effort.
The goal of integration from the enterprise perspective may be less about structure than about organization. Tools like the new Oracle Data Integrator attempt to find balance by loading and transforming Hadoop data so that it can be more readily analyzed in conjunction with traditional enterprise data. This approach enables the fusion of data from multiple sources and stores during the analytics process, where integration is really needed. This middle-of-the road approach leaves the original data free to "be what it is", maintaining the hidden value it may hold for new methods of analysis in the future.
11 Aug 2013