MapReduce Reading
For my 8 hour trip to Kalamazoo last week, I printed some Google white papers for some “light reading”. One of these was MapReduce: Simplified Data Processing on Large Clusters, which was recently updated. I read the original version last year and wanted to catch up. From the paper:
Programmers find the system easy to use: more than ten thousand distinct MapReduce programs have been implemented internally at Google over the past four years, and an average of one hundred thousand MapReduce jobs are executed on Google’s clusters every day, processing a total of more than twenty petabytes of data per day.
Meh. 20 petabytes? You should see the JAR files in *my* app, I tell ya…
Seriously, I did not read that article and think…”hmm…that’s kind of like a database.” I cannot imagine anyone thinking that. Nor is MapReduce an index. You can use it to *create* an index, for example. But still…MapReduce does not compete with a database in any way. It is entirely different, for an entirely different kind of problem. Yet, we have the DeWitt/Stonebraker article, described next.
More Interesting Reading
The punchline is that upon my triumphant return to O’Fallon, MO, I discover everyone’s blogging about MapReduce. OK, that’s just a weird coincidence. I think this unbelievably inaccurate article written by David J. DeWitt and Michael Stonebraker (everybody’s saying they are database experts…) sparked a lot of the debate. It’s generally not cool to link to really awful material, but this one is worth it for the sheer entertainment factor. I recommend you read in this order:
- First, read the Google MapReduce PDF if you need some background. It is fascinating.
- Next, read this wonderful debunking of the “database guru” article.
- Then read MapReduce: A major step backwards, but read the comments first. The comments contain a very high number of well-thought-out, solid criticisms.
The most telling part of all this is the fact that the original authors do not participate in the comments, at all. They are being ripped to pieces — FOR GOOD REASON — and they say nothing.
David J. DeWitt and Michael Stonebraker should retract their highly inaccurate article.
