Really? Craigslist archives all of their posting data? So that fake posting I put up about selling my Viper in 2004 is still in a database somewhere for someone to go in and data mine? I thought there were all sorts of pieces of privacy rights legislation that required them to purge all of that data, not archive it. Apparently the opposite is true?
But here's the point - Craigslist has a massive amount of data they've got to archive. We're talking over a billion posts, and when you get over a billion records of anything, that's when you're starting to talk about dealing with some big numbers.
As you could imagine, Craigslist has two separate system for live posts and archived posts. But for many a year, that archive just mirrored the data structure of the MySQL servers handling their live data. Of course, any schema change on the front end then required a corresponding change on the back end, and that's when all hell would break loose. So Craigslist made a change - they went all NoSQL on the back end, but kept everything very relational on the front end. Crazy? Well, they were crazy enough to make it all work.
Check out the following article, where TheServerSide interviews one of 10Gens marketing schills to find out more about how Craigslist used MySQL and NoSQL to solve their big-data problems.
NoSQL Distilled By Martin Fowler
High Performance MySQL by Baron Schwartz
MongoDB: The Definitive Guide By Michael Dirolf
MongoDB in Action By Kyle Banker
Taming The Big Data Tidal Wave By Bill Franks
The Well-Grounded Java Developer By Martijn Verburg