The PageRank algorithm is a great way of using collective intelligence to determine the importance of a webpage. There’s a big problem, though, which is that PageRank is difficult to apply to the web as a whole, simply because the web contains so many webpages. While just a few lines of code can be used to implement PageRank on collections of a few thousand webpages, it’s trickier to compute PageRank for larger sets of pages. The underlying problem is that the most direct way to compute the PageRank of n webpages involves inverting an n \times n matrix. Even when n is just a few thousand, this means inverting a matrix containing millions or tens of millions of floating point numbers. This is possible on a typical personal computer, but it’s hard to go much further. In this post, I describe how to compute PageRank for collections containing millions of webpages. My little laptop easily coped with two million pages, using about 650 megabytes of RAM and a few hours of computation
Datatype-Generic Programming Roland Backhouse at the University of Nottingham and Jeremy Gibbons at the University of Oxford have a joint EPSRC-supported project entitled Datatype-Generic Programming, running for three years and starting on 1st October 2003. Aim The project is to develop a novel mechanism for parametrizing programs, namely parametrization by a datatype or type constructor. The mechanism is related to parametric polymorphism, but of higher order. We aim to develop a calculus for constructing datatype-generic programs, with the ultimate goal of improving the state of the art in generic object-oriented programming, as occurs for example in the C++ Standard Template Library. further details of the project can be obtained from the contacts listed below.
As part of a larger project, we have built a declarative assembly language. This language enables us to specify multiple code paths to compute particular quantities, giving the instruction scheduler more flexibility in balancing execution resources for superscalar execution. The instruction scheduler is also innovative in that it includes aggressive pipelining, and exhaustive (but lazy) search for optimal instruction schedules. We present some examples where our approach has produced very promising results. I think this paper is a nice followup to the recent discussion of SPE because it goes further than that paper by analyzing what data are necessary to achieve the ultimate goal of optimal or near-optimal instruction scheduling on superscalar architectures. In other words, it strongly suggests that we can do better than simply embedding low-level instructions in a high-level language by instead embedding a graph of desired results (vertices) and instructions for reaching them
My thesis clarified the relationships between existing dialects of ML and studied extending ML with support for recursive modules. Solve a critical problem involving the interaction of recursion and data abstraction known as the double vision problem. In other work with Bob Harper and Manuel Chakravarty, I showed how to extend ML with support for overloading and ad hoc polymorphism through Haskell-style type classes. This work exposes connections between ML modules and Haskell type classes, resulting in a unifying account of the two features Andreas and I have developed a novel module system design that addresses one of the key remaining problems with recursive modules, namely separate compilation. This design seamlessly integrates elements of both traditional ML module systems and Bracha-style mixin modules, resulting in a minimalist account of the ML module system that unifies features. A prototype is available
Does anyone know of work done on co-data-types? There has been a lot of work done on co-data and stream programming, but the assumption is usually of a very simple repetitive data stream. Has anyone done any work on highly structured co-data or co-data-types? A data structure passed into a function represents a limit on all of it's constituent types, that is, the structure must be fully constructed (AND logic) before the function can be called. But a co-data structure (co-type) passed into a function represents a co-limit and only needs to be partially constructed (OR logic), perhaps the function can even return partial (co-data) results based upon the co-data-type it receives. This seems to me a very relevant topic for advancing stream programming and the exploitation of multiple layers of data parallelism.