"Only one third of a search engine is devoted to fulfilling search requests. The other two thirds are divided between crawling (sending a host of single-minded digital organisms out to gather information) and indexing (building data structures from the results)." By George Dyson
document analysis and OCR system, featuring pluggable layout analysis, pluggable character recognition, statistical natural language modeling, and multi-lingual capabilities. The OCRopus engine is based on two research projects: a high-performance handwriting recognizer developed in the mid-90's and deployed by the US Census bureau, and novel high-performance layout analysis methods. OCRopus is development is sponsored by Google and is initially intended for high-throughput, high-volume document conversion efforts.
Joshua Bloch about designing APIs and why it matters heavily for a the success of a company or a projects. Mostly common sense but useful if you get into a debate about it
Tutorial Slides by Andrew Moore. The links point to a set of tutorials on many aspects of statistical data mining, including the foundations of probability, the foundations of statistical data analysis, and most of the classic machine learning and data mining algorithms.
an application that uses the power of Google's global computer network to make web pages load faster, local cache, prefetching, data compression. Currently a Google Labs product. (probably good for very slow connections and as a general proxy)
Instead of using a piece of paper, your calculator, or a computer math software program, you can now solve mathematical problems with Google’s built-in calculator function.