How Amazon tools are solving Big Data problems with Big Data Analytics
By Cameron McKenzie
The whole point of amassing data about your users is to intelligently analyze that data and uncover new trends and unforeseen behavior. Given, there may never come a time when business intelligence applications will be able to properly sift through the petabytes of data that some organizations are collecting on a daily basis, but that’s no excuse for enterprises to settle with a narrow view of big data analytics. But how can organizations figure out how to best take advantage of their newly amassed data and statistics when only a finite amount of processing is available? It takes time, patience, and as you’ll see, the wherewithal to put people in charge who can implement the right plan for your business.
Putting the right people in charge
Big Data itself is only a few years into its maturation, which means Big Analytics is only now incubating. That means that there’s a pretty big skills gap in this arena, so finding the right experts is a challenge. In a recent InformationWeek survey about ‘Analytics, Business Intelligence, and Information Management’, “47% of respondents listed ‘expertise being scarce and expensive’ as their primary concern about using big data software.” But to do business intelligence (BI) correctly, finding the right people is an absolute must.
As the lively Data Science Debate at the O’Reilly 2012 Strata Conference confirmed, the decision about who to hire to mine Big Data for big insights isn’t easy, with one of the lightning rod issues being whether domain experts or machine learning specialists offer more value to the business.
Data scientists who focus solely on numbers and patterns have delivered remarkable results in years gone by, using machine learning along with tried and true algorithms to find connections that even the most seasoned domain experts miss. But Big Data consultant Drew Conway makes a strong case that machine learning as a tool may provide some interesting answers—but those answers beg an important question. “Can you interpret the results in any meaningful way?” Said Conway. “My guess is, probably not. A domain expert would have to look at that model and decide whether the features that it selected and the outcomes and the coefficients that were derived were actually relevant to a sample outside the training dataset or test set. That’s something fundamental to domain expertise.”
Enterprises will need to build a team that includes experts in both disciplines. To mine data accurately, it makes sense to have a domain expert develop the questions that will be asked, then rely on machine learning experts to develop and implement the queries and create the analysis, and finally have the domain experts make sense of the results.
What's old is new again
And Big Analytics isn’t just about mining the information organizations have acquired since the dawn of the Big Data age. “We’ve seen customers come up with completely new business models where they used historical datasets that they had with some social media datasets in order to monetize or price things that earlier they were not even charging for,” said Intel’s Girish Juneja at the Amazon’s latest AWS Summit in San Francisco. But new technology is always the most fertile ground for discovering new insights into user behavior, with mobile users being a particularly fruitful field to sow. “What we are seeing is that as more and more apps are being driven by mobile users, the amount of data that’s being generated is increasing. Most of that data is getting collected in the cloud environments like AWS. Then, the new business models that are making use of that data and offering new services based on that data are arising.”
Getting ahead of the competition
And what types of tools are enterprises using to sift through their Big Data in order to discover some Big Analytics? Amazon’s Elastic MapReduce has always been a popular choice, helping customers to leverage BI from Big Data sources that are currently underutilized. An often touted case study from a few years ago, Yelp began sorting through its giant compilation of log files, looking for hidden connections. “One of the things they were able to figure out by analyzing that data is that people were visiting the site on mobile devices,” said John Einkauf, Senior Product Manager for Amazon at the 2014 AWS Summit in San Francisco. “This was a couple of years ago, before a lot of companies had really clued in to the shift towards mobile. As a result, they were able to make investments in mobile that are serving them very well today. As of January 2013, they are serving 9.5 million unique mobile devices. It all comes back to this initial insight that they were able to pull out of many, many terabytes of log data.” Identifying the data competitors are currently ignoring and creating a strategy to mine it. That’s the type of approach and strategy that differentiates the market leaders from the also-rans.
Generating the right results
Another often untapped source of data for performing analytics is unstructured data from social sources. Dealing with unstructured data is always a massive challenge, as there are great difficulties in identifying data that is relevant, but nevertheless, unstructured data is becoming more and more important in the world of business intelligence and big data analytics. So how does an organization filter out the noise when consuming unstructured text? Most analytics strategies that deal with unstructured text involve a feedback loop to generate more highly targeted data for examination over time. Insights gleaned from existing social sources can then be turned into experiments that can be conducted using social media participants as test subjects. At the enterprise level, this may mean launching various social media campaigns that pose questions, invite commentary, or provoke some other response that can then be measured and analyzed. It’s a time consuming and highly involved process, but meaningful information obtained through social media can be golden when it comes to learning about what customers really want.
Ironically, many of the solutions that will make Big Analytics most effective will entail collecting and creating even more data. However, with a proactive rather than a reactive strategy, enterprises can position themselves to take advantage of the insights hidden in past, present, and future Big Data.
What strategies do you use to find trends in your Big Data systems? Let us know.
23 Jun 2014