Extracting patterns and relations from the world wide web
S. Brin. In WebDB Workshop at 6th International Conference on Extending Database Technology, EDBT’98, Seite 172--183. (1998)
Zusammenfassung
Abstract. The World Wide Web is a vast resource for information. At the same time it is extremely distributed. A particular type of data such as restaurant lists may be scattered across thousands of independent information sources in many di erent formats. In this paper, we consider the problem of extracting a relation for such a data type from all of these sources automatically. We present a technique which exploits the duality between sets of patterns and relations to grow the target relation starting from a small sample. To test our technique we use it to extract a relation of (author,title) pairs from the World Wide Web. 1
Beschreibung
CiteSeerX — Extracting patterns and relations from the world wide web
%0 Conference Paper
%1 dipre
%A Brin, Sergey
%B In WebDB Workshop at 6th International Conference on Extending Database Technology, EDBT’98
%D 1998
%K based dipre extraction regexp relation seed semi supervised
%P 172--183
%T Extracting patterns and relations from the world wide web
%U http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.101.3197
%X Abstract. The World Wide Web is a vast resource for information. At the same time it is extremely distributed. A particular type of data such as restaurant lists may be scattered across thousands of independent information sources in many di erent formats. In this paper, we consider the problem of extracting a relation for such a data type from all of these sources automatically. We present a technique which exploits the duality between sets of patterns and relations to grow the target relation starting from a small sample. To test our technique we use it to extract a relation of (author,title) pairs from the World Wide Web. 1
@inproceedings{dipre,
abstract = {Abstract. The World Wide Web is a vast resource for information. At the same time it is extremely distributed. A particular type of data such as restaurant lists may be scattered across thousands of independent information sources in many di erent formats. In this paper, we consider the problem of extracting a relation for such a data type from all of these sources automatically. We present a technique which exploits the duality between sets of patterns and relations to grow the target relation starting from a small sample. To test our technique we use it to extract a relation of (author,title) pairs from the World Wide Web. 1},
added-at = {2012-11-04T18:35:49.000+0100},
author = {Brin, Sergey},
biburl = {https://www.bibsonomy.org/bibtex/2b61902b899c37114fbde944f09727d84/jil},
booktitle = {In WebDB Workshop at 6th International Conference on Extending Database Technology, EDBT’98},
description = {CiteSeerX — Extracting patterns and relations from the world wide web},
interhash = {09602a4694d90a0d736be8c01291b4ee},
intrahash = {b61902b899c37114fbde944f09727d84},
keywords = {based dipre extraction regexp relation seed semi supervised},
pages = {172--183},
timestamp = {2013-11-23T20:11:51.000+0100},
title = {Extracting patterns and relations from the world wide web},
url = {http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.101.3197},
year = 1998
}