It has been a couple of years since I posted statistics from WorldCat, so here is a new spreadsheet based on an October 1, 2009 snapshot (see the earlier post for an explanation of the table). WorldCat has changed dramatically...
This is the first of a three-part series called TFIDF In Libraries, where “relevancy ranking” will be introduced. In this part, term frequency/inverse document frequency (TFIDF) — a common mathematical method of weighing texts for automatic classification and sorting search results — will be described.
Wordnik is based on the principle that people learn words best by seeing them in context. We've collected more than 4 billion words of text (web pages, books, magazines, newspapers, etc.) and have mined them exhaustively to show you example sentences for any word you're interested in.