sign in · help · news · about · deen

BibSonomy ::  publication ::

The blue social bookmark and publication sharing system.
entry of diego_ma and 7 other users:    
(0)
This publication has not been reviewed yet.
rating distribution
average user rating
?
The average rating is computed over all reviews. However, some of them may be invisible to you due to the visibility setting chosen by the reviewers.
(0.0 of 5.0 based on 0 reviews)

The Indexable Web is more than 11.5 billion pages

by: Antonio Gulli, and Alessio Signorini
(2005) .
Citation format (all formats):

Abstract

What is the current size of the Web? At the time of this writing, Google claims to index more than 8 billion pages, MSN Beta claims about 5 billion pages, Yahoo! at least 4 billion and Ask/Teoma more than 2 billion. Two sources for tracking the growth of the Web are 6,7, although they are not kept up to date. Estimating the size of the whole Web is quite difficult, due to its dynamic nature According to Andrei Broder, the size of the whole Web depends strongly on whether his laptop is on the web, since it can be configured to produce links to an infinite number of URLs!. Nevertheless, it is possible to assess the size of the publically indexable Web. The indexable Web 4 is defined as "the part of the Web which is considered for indexing by the major engines". In 1997, Bharat and Broder 2 estimated the size of Web indexed by Hotbot, Altavista, Excite and Infoseek the largest search engines at that time at 200 million pages. They also pointed out that the estimated intersection of the indexes was less than 1.4\%, or about 2.2 million pages. Furthermore, in 1998, Lawrence and Giles 3 gave a lower bound 800 million pages. These estimates have now become obsolete. In this short paper, we revise and update the estimated size of the indexable Web to at least 11.5 billion pages as of the end of January 2005. We also estimate the relative size and overlap of the largest Web search engines. Precisely Google is the largest engine, followed by Yahoo!, by Ask/Teoma, and by MSN Beta. We adopted the methodology proposed in 1997 by Bharat and Broder 2, but extended the number of queries used for testing from 35,000 in English, to more than 438,141 in 75 different languages. We remark that an estimate of the size of the web is useful in many situations, such as when compressing, ranking, spidering, indexing and mining the Web.

BibTeX record

Endnote record

a gripper