Evolution of Google File System August 08, 2009 07:26:40 EDT There is an interesting interview about the evolution of the Google File System in ACM Queue. I think it is readable by anybody, not just ACM members. One of the morals of this story is that, even if you are building what you think will be the world's biggest, you still will make design decisions that you know are not scalable because you know how to implement them. It is better to get something running right away and start using it. Of course, they also ran into scalability problems that they did not expect. So, some of the evolution of GFS was planned, and some was unplanned.
Dumbo is a project that allows you to easily write and run Hadoop programs in Python (it’s named after Disney’s flying circus elephant, since the logo of Hadoop is an elephant and Python was named after the BBC series “Monty Python’s Flying Circus”). More generally, Dumbo can be considered to be a convenient Python API for writing MapReduce programs.