The PageRank algorithm is a great way of using collective intelligence to determine the importance of a webpage. There’s a big problem, though, which is that PageRank is difficult to apply to the web as a whole, simply because the web contains so many webpages. While just a few lines of code can be used to implement PageRank on collections of a few thousand webpages, it’s trickier to compute PageRank for larger sets of pages. The underlying problem is that the most direct way to compute the PageRank of n webpages involves inverting an n \times n matrix. Even when n is just a few thousand, this means inverting a matrix containing millions or tens of millions of floating point numbers. This is possible on a typical personal computer, but it’s hard to go much further. In this post, I describe how to compute PageRank for collections containing millions of webpages. My little laptop easily coped with two million pages, using about 650 megabytes of RAM and a few hours of computation
THESE days, Google seems to be doing everything, everywhere. It takes pictures of your house from outer space, copies rare Sanskrit books in India, charms its way onto Madison Avenue, picks fights with Hollywood and tries to undercut Microsoft’s softwar
S. Maslov, and S. Redner. (2009)cite arxiv:0901.2640
Comment: 3 pages, 1 figure, invited comment for the Journal of Neuroscience.
The arxiv version is microscopically different from the published version.