The Web is a hypertext body of approximately 300 million pages
that continues to grow at roughly a million pages per day. Page
variation is more prodigious than the data's raw scale: taken as a
whole, the set of Web pages lacks a unifying structure and shows far
more authoring style and content variation than that seen in traditional
text document collections. This level of complexity makes an
“off-the-shelf” database management and information
retrieval solution impossible. To date, index based search engines for
the Web have been the primary tool by which users search for
information. Such engines can build giant indices that let you quickly
retrieve the set of all Web pages containing a given word or string.
Experienced users can make effective use of such engines for tasks that
can be solved by searching for tightly constrained key words and
phrases. These search engines are, however, unsuited for a wide range of
equally important tasks. In particular, a topic of any breadth will
typically contain several thousand or million relevant Web pages. How
then, from this sea of pages, should a search engine select the correct
ones-those of most value to the user? Clever is a search engine that
analyzes hyperlinks to uncover two types of pages: authorities, which
provide the best source of information on a given topic; and hubs, which
provide collections of links to authorities. We outline the thinking
that went into Clever's design, report briefly on a study that compared
Clever's performance to that of Yahoo and AltaVista, and examine how our
system is being extended and updated
Description
Welcome to IEEE Xplore 2.0: Mining the Web's link structure
%0 Journal Article
%1 Chakrabarti:1999
%A Chakrabarti, S.
%A Dom, B.E.
%A Kumar, S.R.
%A Raghavan, P.
%A Rajagopalan, S.
%A Tomkins, A.
%A Gibson, D.
%A Kleinberg, J.
%B Computer
%D 1999
%K hubs+authority search
%P 60-67
%R 10.1109/2.781636
%T Mining the Web's link structure
%U http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=781636
%V 32
%X The Web is a hypertext body of approximately 300 million pages
that continues to grow at roughly a million pages per day. Page
variation is more prodigious than the data's raw scale: taken as a
whole, the set of Web pages lacks a unifying structure and shows far
more authoring style and content variation than that seen in traditional
text document collections. This level of complexity makes an
“off-the-shelf” database management and information
retrieval solution impossible. To date, index based search engines for
the Web have been the primary tool by which users search for
information. Such engines can build giant indices that let you quickly
retrieve the set of all Web pages containing a given word or string.
Experienced users can make effective use of such engines for tasks that
can be solved by searching for tightly constrained key words and
phrases. These search engines are, however, unsuited for a wide range of
equally important tasks. In particular, a topic of any breadth will
typically contain several thousand or million relevant Web pages. How
then, from this sea of pages, should a search engine select the correct
ones-those of most value to the user? Clever is a search engine that
analyzes hyperlinks to uncover two types of pages: authorities, which
provide the best source of information on a given topic; and hubs, which
provide collections of links to authorities. We outline the thinking
that went into Clever's design, report briefly on a study that compared
Clever's performance to that of Yahoo and AltaVista, and examine how our
system is being extended and updated
@article{Chakrabarti:1999,
abstract = {The Web is a hypertext body of approximately 300 million pages
that continues to grow at roughly a million pages per day. Page
variation is more prodigious than the data's raw scale: taken as a
whole, the set of Web pages lacks a unifying structure and shows far
more authoring style and content variation than that seen in traditional
text document collections. This level of complexity makes an
“off-the-shelf” database management and information
retrieval solution impossible. To date, index based search engines for
the Web have been the primary tool by which users search for
information. Such engines can build giant indices that let you quickly
retrieve the set of all Web pages containing a given word or string.
Experienced users can make effective use of such engines for tasks that
can be solved by searching for tightly constrained key words and
phrases. These search engines are, however, unsuited for a wide range of
equally important tasks. In particular, a topic of any breadth will
typically contain several thousand or million relevant Web pages. How
then, from this sea of pages, should a search engine select the correct
ones-those of most value to the user? Clever is a search engine that
analyzes hyperlinks to uncover two types of pages: authorities, which
provide the best source of information on a given topic; and hubs, which
provide collections of links to authorities. We outline the thinking
that went into Clever's design, report briefly on a study that compared
Clever's performance to that of Yahoo and AltaVista, and examine how our
system is being extended and updated},
added-at = {2007-09-18T11:19:04.000+0200},
author = {Chakrabarti, S. and Dom, B.E. and Kumar, S.R. and Raghavan, P. and Rajagopalan, S. and Tomkins, A. and Gibson, D. and Kleinberg, J.},
biburl = {https://www.bibsonomy.org/bibtex/26aff9aa0c77da89000edda7d3390e74a/jfreyne},
booktitle = {Computer},
description = {Welcome to IEEE Xplore 2.0: Mining the Web's link structure},
doi = {10.1109/2.781636},
interhash = {421fd080e487da38c0b1c408972968c1},
intrahash = {6aff9aa0c77da89000edda7d3390e74a},
issn = {0018-9162},
keywords = {hubs+authority search},
pages = {60-67},
timestamp = {2007-09-18T11:19:04.000+0200},
title = {Mining the Web's link structure},
url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=781636},
volume = 32,
year = 1999
}