The challenge of tagging for analytics in a world of unstructured text

Many organizations are managing massive amounts of information in their big data systems, but handling that inflow, and making it make sense is a massive challenge. Tagging is one solution, but how does one tag unstructured text? It's a problem that is slowly being solved.

Analytics is indisputably one of the most important mechanisms for maintaining a competitive edge in today's technologically advanced marketplace. Unfortunately, any analytical process is only as complete as the data from which it is derived—and this data is only accessible when it is in a useable format. Historically, converting unstructured text into analyzable data has proven to be a challenge. Yet for the enterprise, the results are likely to be well worth the investment.

While checkbox forms and traditional data fields can collect and maintain core information to create metrics and evaluate the efficiency of many business processes, thoughts expressed in freeform often contain the most prized pieces of information. From open-ended survey responses or social media comments to notes appended to insurance claims, unstructured text gives enterprises a chance to discover what's really going on behind the scenes. Text mining and analysis enables enterprises to identify associations between seemingly random choices, monitor market trends on a macro or micro scale, and predict or influence consumer behavior with remarkable accuracy.

The case for analyzing text is strong

According to Rebecca Wettemann of Nucleus Research, in her joint paper with SPSS on 'The Real Benefits from Text Mining', "Text mining can help companies leverage all the unstructured information they have about products, services, competitors, and customers to increase customer satisfaction and loyalty." Even in 2008 when Nucelus explored a number of case studies on the topic, organizations were achieving well-documented reductions in churn rate, improved productivity, greater ROI for marketing, and faster R&D because of the insights gained from text mining. Today, the volume and variety of unstructured text sources that can be mined for information are greater than ever—and the tools to configure and analyze this data are maturing as well.

That's part of the beauty of tag management. This type of solution allows you to pick your tool of choice.

James Niehaus,
VP of digital analytics, Ensighten

Tagging, also referred to as annotation, is one rapidly evolving technology that classifies and clusters data for analysis. Tagging works in conjunction with predictive analytics tools to grow and refine a knowledge base for mining unstructured text. Businesses disambiguate, tag, and otherwise structure text to glean insights into consumer sentiment and gain a competitive advantage.

How is the business world linking tagging to analytics?

According to James Niehaus, VP of Analytics & Digital Strategy at Ensighten, tag management software is most commonly integrated with web analytics solutions such as Google Analytics, Adobe SiteCatalyst, IBM Coremetrics, and Web Trends. Optimization systems like Monetate and Optimizely are also common BI tools that use tagged data as enterprises seek to derive greater direct ROI from their unstructured text. The analysis itself is fed back into further experimentation with multivirate testing (MVT) to uncover even more information about user behavior. Tagging has been slotted fairly seamlessly into the workflow because the concept is simple enough to work with many BI systems.

As Niehaus said, "That's part of the beauty of tag management. This type of solution allows you to pick your tool of choice. It's possible to deliver data from the creation point on your site or digital touch point and map it onto the tool of choice from Google Analytics to Splunk. From there, you can take whatever approach you want using the analytics tools you have."

Tagging not only recognizes existing terms of importance, it also helps predict when and where new forms of tagging should be implemented. It's all part of a continuous feedback system, in the opinion of Brian Bell, VP of Enterprise Solutions at Expert System. "A properly designed predictive analytics solution working with unstructured text is placed in the workflow to add in contextually relevant terms, clustered around core concepts, or categorize around ideas known to enterprise. The hidden benefit of this approach is that it can reveal data that organizations didn't even know they had."

Is there a limit to the information that should be gleaned from unstructured text?

While text mining is designed to bring new information to light, sometimes it can uncover information that's best left unseen. In these situations, determining what to leave out is as important as knowing what to display. For example, appropriate tagging can actually help protect private information. PII (Personally Identifiable Information) may not be relevant to an organization's BI. Certainly not everyone on a business team needs to have access to names, birth dates, and social security numbers that may inadvertently be collected along with unstructured text. Bell pointed out that, with proper tag management, potentially sensitive information can be effectively sequestered to prevent accidental exposure before the remainder of the data is analyzed and served up in a report or used for business activities.

On a larger scale, even with the wealth of information hidden in the flow of Big Data, not every detail is relevant. Knowing what to ignore is just as important as knowing what to inspect. As tagging becomes the norm, businesses will be faced with new questions about when enough is enough. Perhaps a day will arrive when even the tags themselves will need to be tagged for relevance.

How are you performing analytics on unstructured text? Let us know.

Dig Deeper on Software development best practices and processes

App Architecture
Software Quality
Cloud Computing