Feature:

Affordable performance and scalability with AWS Big Data solutions

By Cameron McKenzie

TheServerSide.com

Over the past decade, the enterprise database has transformed completely to meet the requirements of Big Data. Scalability, reliability, and speed are three of the biggest challenges facing the traditional, relational database, and as relational systems have floundered as gargantuan data processing requirements have arisen, a burgeoning need for non-relational database solutions has emerged. As expensive and costly relational databases fail at scale, it is no surprise to discover that simple and scalable NoSQL solutions that can run on cheap, commodity hardware have taken the industry by storm.

“I feel like NoSQL has freed people to look at data structures in a different way.” Said Simon St. Laurent, Fluent co-chair and Senior Editor at O’Reilly Publishing. “They can say ‘These are the pieces that I need, and they may or may not have connections. I don’t have to define that all at the beginning.’ I always loved drawing out the tables on a relational database system.  It was fun; it was happy! But then I would change them later, and I would suffer. With NoSQL, you can make yourself suffer for different reasons; for example, you can structure your data badly. But I feel like there’s less ache in iterating and making changes later.” And along with this need to structure data well, there’s also enormous pressure to manage the database itself properly in order to achieve optimal performance.

AWS rises to the occasion

Of course, when there’s a scalability and performance problem on the computing horizon, it’s not long before a Bay Area behemoth step up to the occasion. Not surprisingly, Amazon launched its Relational Database Service (RDS) in 2009 to help businesses overcome the obstacles involved in handling traditionally structured Big Data, but a fully functional NoSQL solution lagged behind. The NoSQL-esque Dynamo concept was originally described in 2007 and AWS soon began offering non-relational data storage systems based upon it. Surprisingly though, like so many inventions that are ahead of their time, the initial launch of Dynamo left corporations cold. Organizations still had to manage the operational side, with all of the associated complexities. Adding to the frustration was the fact that the SimpleDB service for smaller businesses didn’t scale up to meet the needs of enterprise. AWS needed to create a dynamic non-relational database that could be consumed as a service at the enterprise level. That solution, DynamoDB, only came together in 2012, but since then, it has experienced huge success.

In his keynote address at the San Francisco AWS Summit in 2014, Sr. VP Andy Jassy revealed that DynamoDB is one of the fastest growing services they’ve ever released. It came out of the gate with low latency and high throughput, and AWS has been innovating and improving ever since based on customer feedback. “We added global and local secondary indexes to improve the query flexibility. That was a really big deal for our customers. We added item level access control that allows you to put an access control policy for an item in a table.” Additional features now include parallel scan, batch writes, a geo-spatial indexing library, and various testing tools.

Turning the dials on DynamoDB

We added global and local secondary indexes to improve the query flexibility. That was a really big deal for our customers. 

AWS Senior VP
Andy Jassy

Having access to the full breadth of AWS infrastructure and network power as a managed service for non-relational databases has changed how Operations engineers think about their true DB requirements. Dave Albrecht at Crittercism described it this way: “When you think about DBs, a lot of the time the number in your head has to do with the capacity. How big is this—ten gigabytes, ten terabytes, twenty terabytes? Amazon is doing something pretty special and different with its whole provisioned throughput model. They’re letting you pick along both of these two axes. You get to ask, ‘How much capacity do I want on this database, and also how many IOPS do I want on this database?’ It’s kind of crazy that you can just turn those two knobs more or less independently of each other.”

Having the ability to refine the performance of databases allows Operations to support the needs of developers and consumers of enterprise applications at a new level. At the same time, this type of fine-tuned control can create other issues. Developer Tim Gross noted in his blog post series Falling in and out of love with DynamoDB that spikes in demand can cause problems with throttling. He recommends using cron jobs, estimation, and careful monitoring to manage this AWS service.

Enterprises still have many choices to make

Right now, the number of NoSQL databases on offer is far broader than the market targeted by DynamoDB. AWS is currently partnered with MongoDB and Couchbase to help customers run non-relational DBs in EC2 and EBS. However, enterprises that need to implement a variety of non-relational DBs to handle specific types of data will need to explore less standardized solutions in the cloud. Developers can install any standard NoSQL database they wish on Amazon’s EC2. This may be the preferred option for organizations that already have the skills and expertise to manage scaling and other aspects of administration in-house. No doubt the number of AWS partnerships with major non-relational DB solution providers will continue to grow, allowing enterprises to let more and more of their Big Data be managed in DynamoDB. 

How have you leveraged NoSQL in your enterprise solutions? Let us know.

 

22 Jun 2014

Related Content

Related Resources