It took a long time for software architects to look beyond structured databases as a solution to common problems, but there is little debate that NoSQL databases have filled a serious void in a software development space. Extending the success of NoSQL databases and big data solutions, architects are now realizing that a new type of approach to working with data, be it a graph database or a graphing engine, can help simplify problems that don't fit nicely into a relational or standard NoSQL database domain.
An important part of simplifying a recommendation engine relies on identifying the right data abstraction. At first blush, many developers could be tempted to bolt some logic onto an SQL database. A much better abstraction is to leverage a graph engine library, said Keith Horwood, founder at Polybit, a software development firm, at the Fluent Conference in San Francisco.
Graphing techniques have considerable applications beyond just providing recommendations and beyond NoSQL databases capabilities. They can help make sense of patterns buried in data, improve social network applications, and improve master data management throughout a supply chain.
Horwood explained how he took common open source components to rebuild a better recommendation engine for a property listing service called TheStorefront.com. The client provides a marketplace for short term retail rentals. They wanted to increase their resource usage and increase the number of requests from users. The new engine needed to be created in about two weeks with minimal technical debt.
The problem with rules
A big part of the challenge is that no one was entirely sure what factors were important to user preferences.TheStorefront.com had previously built a rules-driven recommendation engine on top of an SQL database. The developers invented rules that seemed important, but this did not always reflect user behavior. Horwood said, "Engineers don't think like people looking for retail space."
Engineers don't think like people looking for retail space.
Furthermore the rules were complex, brittle and difficult to change. This resulted in a long search process that took 300-milliseconds to generate each new recommendation of properties to users. It took hours to generate recommendations for the entire user base as a result. A graph-based approach allowed them to generate better recommendations in less than one millisecond.
Graphing engines leverage algorithms from social networking applications to model the relationship between many objects. Nodes are used to represent entities like users and properties, while edges represent the relationships between them. In this specific case, the edges could represent listings purchased by users, favored or visited. The weight of these edges is given a higher value for prior purchases, then favorites, and finally visits. Horwood said, "This represents different levels of intent by the users."
Graph-based solutions are node or vertex type agnostic. This makes it easy to start determining relationships between many users and listings. It does not matter at which node the algorithm starts. This makes it easy to compute recommendations by starting with the users or properties. In addition, this approach makes it easy to generate recommendations based on users' prior behavior, rather than a predefined set of rules.
The more data from a user, the better the recommendations. Furthermore, the results can be improved when users with similar tastes book, favorite or visit properties in the database. Graph traversal calculations are easy to perform. This eliminates the need for SQL optimization and giant joins, Horwood said.
Implementing the graph
One good graphing database is Neo4j, which is well supported and documented. Horwood said it works well for building complex data structures. But they wanted things simple since there were only two kinds of entities and three types of edges in this particular implementation. The goal was to implement the project on top of Node.js, which was already running as part of the main TheStorefront.com application. So Horwood built UnitGraph, an open source graph traversal library that runs on top of Node.js.
The graph is generated from the relational database once per day and then cached on disk. This allows new recommendations to be quickly generated, even if the server goes down. This approach works well for modest database sizes. The application data was able to be processed entirely in RAM.
There are also many optimizations to run to improve query performance. For example, the recommendation engine might only look at user interactions over the last week or month. Another optimization is to automatically remove unused nodes, such as inactive users and properties, from the engine. Horwood said a simple problem with only a few entity types is easy to implement using UnitGraph. Larger data sets would benefit from third-party solutions.
Have you used a graphing engine to solve complex problems in your architecture? Let us know.