Maksim Kabakou - Fotolia

How Pandora built a better recommendation engine

Find out how Pandora developed an AI application for building a better recommendation engine that combines novelty and comfort.

Pandora has quickly become one of the most popular music streaming services built on top of a Java, Scala, and Erlang infrastructure. It features the concept of customized radio stations that allow users to automatically play genres of music. As larger players like Apple and Amazon enter the field, Pandora realized they needed a better recommendation engine to continue building their base of users.

"The vision of Pandora is to be an effortless source of music," said Oscar Celma, director of research at Pandora. The stations are automatically generated playlists based on feedback from users. Celma said the challenge lies in balancing new music with popular music. This cloud based strategy has allowed the company to grow to 78 million users that listen for an average of 24 hours per month.

This secure, cloud based strategy was incorporated in the Pandora Thumbprint Radio station, which provides a personalized sound track for each user. This one station has quickly grown to become the most popular station on Pandora with 30 million active listeners, making up 3% of overall listening. Pandora research found success by experimenting with over 70 different algorithms, and a sophisticated testing strategy.

Deconstructing the recommendation engine

Pandora takes a multi-tiered approach to evaluating and recommending music. A team of musicologists annotates songs based on genre, rhythm, and progression. This data is transformed into a vector for comparing song similarity. This approach helps to promote the presentation of long tail music from unknown artists that might be a good fit for a particular user.

The service also takes advantage of feedback from users. It has gathered almost 75 billion points of feedback about what users like. The Pandora recommending algorithms also do personalized filtering based on a user's choice in music, the stations they listen to, and their geography.

The recommender uses about 70 different algorithms: 10 analyze content, 40 process collective intelligence, and then another 30 do personalized filtering. Celma said, "This is challenging from an engineering point of view. We have the goal that when you thumb down a song, the recommendation for the next song occurs in less than 100 milliseconds. It is hard to do this in a way that scales across all users."

Balance new with familiar

One of the biggest challenges in building a better recommendation engine for music, TV shows, books, or products lies in balancing the novel with the familiar. The personalization problem can be more difficult because moods change. Users might not like songs today that were their favorites last year. Sometimes users just want familiar songs, while at others times they can get more excited by new songs.

The recommendation engine needs to play some new songs to find out what users like. But if users end up disliking too many songs, they may be reluctant to return to the service. One useful way to think about the space is as an XY graph separate axes for novelty versus relevance. Familiar songs are relevant, but have low novelty. Celma said, "Ideally the exploration should fall into what they think is relevant but highly novel."

Most of the other music recommender agents start with popular songs and then slowly add more novelty as they learn about the user. Pandora has been pushing the edge to suggest more novel songs that users like more quickly. However Celma stressed it is important to build trust and add novelty slowly to keep users coming back.

A key goal of Thumbprint radio is to create a playlist a user might make for themselves, based on their previous interactions with different Pandora stations. But this has raised numerous questions. How often should it switch from one genre to another? How should it balance familiarity and discovery?

A typical user might subscribe to 100 different stations, but only actively listen to 4. The Thumbprint algorithms only include listening data when a users has provided feedback on at least 4-songs on a station. They also weight genres based on how often a user listens to a particular category of songs.

Testing reccomendation engine preferences

The quickest approach for identifying the quality of a new recommendation engine is to test it offline based on historical user data. In this case, the team analyzes 11 months of user data to generate a series of recommendations. The new recommender is validated based feedback from 1 month of actual user interactions. Celma said, "We wanted do some internal testing before we release it into the wild."

A second tier of testing is done with small group of listeners to see how it stands up in the real world. In this case, the system might test out a new algorithm in the recommender on 1% of users and compare the results with the existing recommender. Pandora might have several these different experiments going on at any given time and the results are analyzed over a period of 6-9 months. This longer testing period gives them time to get better data about usage and the number of likes/dislikes recorded by users.

Most metrics is based on the amount of time users sped listening to music and the number of different days they return to the app. They don't tend change much day to day, since meaningful difference tend to be slower and small. A typical change between two implementation of a recommender might only be .2% increase in listening time. Overall, they have found that more engaged users tend to be more open to novelty. These users tend to give better feedback on their experience overall.

It can be challenging trying to analyze too much data about the number of times users forward, replay, or like songs. Celma said, "Adding more signals to the analysis can yield more data, but there is also more noise."

Next Steps

Working with Recommendation Engines

Semi-structured data is the key to LinkedIn's recommendation engine

Gartner says retailers need to take advantage of search and recommendation engine technology

Dig Deeper on Front-end, back-end and middle-tier frameworks

App Architecture
Software Quality
Cloud Computing