Yale graph DB project uncovers hidden big data trends
By Jason Tee
Do you ever wonder what computer scientists cook up in their labs while the commercial IT world is busy chasing money and short term business goals? You might be surprised to learn that many systems and solutions that have startling and highly practical real-world applications get their start in the world of academics by research fellows. In fact, the next really big breakthroughs in big data and distributed computing are set to come from the Ivy League, not the IT league. TheServerSide got to take a peek inside the brain of Dr. Daniel Abadi, Associate Professor of Computer Science at Yale University, to find out what's in store for big data and database technology. We have to warn you, it's going to be pretty graphic.
Exploring the edge with graph DB
When it comes to big data, it's not always what you know that matters, but instead what matters is how each piece of information relates to all the other stuff. That's where being able to take big data and plot it on a graph becomes very valuable. The Yale graph database project is all about working with, storing, and analyzing data that is best represented as vertexes and then exploring the edges between them. This information can be analyzed to find matching patterns, calculate shortest differences and uncover influences. According to Dr. Abadi, "We're interested in data that is naturally represented by a graph - things like social networks, linked data and RDF data." Abadi mentioned casually that terrorist networks fall in this category too since "who is talking to whom" is just the kind of data you'd want to plot on a graph to check for patterns. So consider yourself warned: be careful who you friend on Facebook because the Department of Homeland Security or MI-6 might be watching soon.
Doing graph DB the right way
The type of data managed using a graph DB is often huge, complex, and very dense. Plus, new data is being added all the time from the mobile scene and the Internet of things. With any large graph, you have to be able to scale easily and efficiently while retaining the ability to access and analyze data for real-time business decision-making. Abadi says the trick is to figure out how to partition the data and store it on multiple machines for distributed computing. But you can't just hack it into pieces without considering how it's all going to fit back together again. You have to cluster it in an intelligent way. "If you don't partition it correctly, a lot of the algorithms you want to run become non-local. Multiple nodes have to get involved in the calculation. The more you can minimize that, the more scalable your system becomes." It's not possible to entirely eliminate communication between nodes, but limiting it means boosting your ability to run multiple nodes in parallel.
Graph DB and business analytics
The most obvious use case to start with will be in social media to uncover the hidden big
influencers and how they impact the purchasing decisions of others. Pinpointing the start of the
ripple effect eventually means being able to start ripples of your own by dropping a pebble at just
the right spot. And by using graph based technologies that understand big data, being able to find
the right spot to drop that pebble is becoming easier and easier.
05 May 2013