copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Unsupervised named-entity extraction from the web: an experimental study

O. Etzioni, M. Cafarella, D. Downey, A. Popescu, T. Shaked, S. Soderland, D. Weld, and A. Yates. Artif. Intell., 165 (1): 91--134 (2005)
DOI: http://dx.doi.org/10.1016/j.artint.2005.03.001

Abstract

The KNOWITALL system aims to automate the tedious process of extracting large collections of facts (e.g., names of scientists or politicians) from the Web in an unsupervised, domain-independent, and scalable manner. The paper presents an overview of KNOWITALL's novel architecture and design principles, emphasizing its distinctive ability to extract information without any hand-labeled training examples. In its first major run, KNOWITALL extracted over 50,000 class instances, but suggested a challenge: How can we improve KNOWITALL's recall and extraction rate without sacrificing precision?This paper presents three distinct ways to address this challenge and evaluates their performance. Pattern Learning learns domain-specific extraction rules, which enable additional extractions. Subclass Extraction automatically identifies sub-classes in order to boost recall (e.g., "chemist" and "biologist" are identified as sub-classes of "scientist"). List Extraction locates lists of class instances, learns a "wrapper" for each list, and extracts elements of each list. Since each method bootstraps from KNOWITALL's domain-independent methods, the methods also obviate hand-labeled training examples. The paper reports on experiments, focused on building lists of named entities, that measure the relative efficacy of each method and demonstrate their synergy. In concert, our methods gave KNOWITALL a 4-fold to 8-fold increase in recall at precision of 0.90, and discovered over 10,000 cities missing from the Tipster Gazetteer.

Description

Unsupervised named-entity extraction from the web

Links and resources

BibTeX key: 1090487
entry type: article
address: Essex, UK
year: 2005
journal: Artif. Intell.
number: 1
pages: 91--134
publisher: Elsevier Science Publishers Ltd.
volume: 165
issn: 0004-3702
DOI: http://dx.doi.org/10.1016/j.artint.2005.03.001
url: http://portal.acm.org/citation.cfm?id=1090483.1090487

@gromgull's tags highlighted

Cite this publication

%0 Journal Article %1 1090487 %A Etzioni, Oren %A Cafarella, Michael %A Downey, Doug %A Popescu, Ana-Maria %A Shaked, Tal %A Soderland, Stephen %A Weld, Daniel S. %A Yates, Alexander %C Essex, UK %D 2005 %I Elsevier Science Publishers Ltd. %J Artif. Intell. %K information-extraction machine-learning unsupervised-learning web %N 1 %P 91--134 %R http://dx.doi.org/10.1016/j.artint.2005.03.001 %T Unsupervised named-entity extraction from the web: an experimental study %U http://portal.acm.org/citation.cfm?id=1090483.1090487 %V 165 %X The KNOWITALL system aims to automate the tedious process of extracting large collections of facts (e.g., names of scientists or politicians) from the Web in an unsupervised, domain-independent, and scalable manner. The paper presents an overview of KNOWITALL's novel architecture and design principles, emphasizing its distinctive ability to extract information without any hand-labeled training examples. In its first major run, KNOWITALL extracted over 50,000 class instances, but suggested a challenge: How can we improve KNOWITALL's recall and extraction rate without sacrificing precision?This paper presents three distinct ways to address this challenge and evaluates their performance. Pattern Learning learns domain-specific extraction rules, which enable additional extractions. Subclass Extraction automatically identifies sub-classes in order to boost recall (e.g., "chemist" and "biologist" are identified as sub-classes of "scientist"). List Extraction locates lists of class instances, learns a "wrapper" for each list, and extracts elements of each list. Since each method bootstraps from KNOWITALL's domain-independent methods, the methods also obviate hand-labeled training examples. The paper reports on experiments, focused on building lists of named entities, that measure the relative efficacy of each method and demonstrate their synergy. In concert, our methods gave KNOWITALL a 4-fold to 8-fold increase in recall at precision of 0.90, and discovered over 10,000 cities missing from the Tipster Gazetteer.

@article{1090487, abstract = {The KNOWITALL system aims to automate the tedious process of extracting large collections of facts (e.g., names of scientists or politicians) from the Web in an unsupervised, domain-independent, and scalable manner. The paper presents an overview of KNOWITALL's novel architecture and design principles, emphasizing its distinctive ability to extract information without any hand-labeled training examples. In its first major run, KNOWITALL extracted over 50,000 class instances, but suggested a challenge: How can we improve KNOWITALL's recall and extraction rate without sacrificing precision?This paper presents three distinct ways to address this challenge and evaluates their performance. Pattern Learning learns domain-specific extraction rules, which enable additional extractions. Subclass Extraction automatically identifies sub-classes in order to boost recall (e.g., "chemist" and "biologist" are identified as sub-classes of "scientist"). List Extraction locates lists of class instances, learns a "wrapper" for each list, and extracts elements of each list. Since each method bootstraps from KNOWITALL's domain-independent methods, the methods also obviate hand-labeled training examples. The paper reports on experiments, focused on building lists of named entities, that measure the relative efficacy of each method and demonstrate their synergy. In concert, our methods gave KNOWITALL a 4-fold to 8-fold increase in recall at precision of 0.90, and discovered over 10,000 cities missing from the Tipster Gazetteer.}, added-at = {2009-11-18T17:04:15.000+0100}, address = {Essex, UK}, author = {Etzioni, Oren and Cafarella, Michael and Downey, Doug and Popescu, Ana-Maria and Shaked, Tal and Soderland, Stephen and Weld, Daniel S. and Yates, Alexander}, biburl = {https://www.bibsonomy.org/bibtex/2d6c03d4a71ce887c03e2eb4f81c29a0c/gromgull}, description = {Unsupervised named-entity extraction from the web}, doi = {http://dx.doi.org/10.1016/j.artint.2005.03.001}, interhash = {a3e3a9914288231e411567cae1547486}, intrahash = {d6c03d4a71ce887c03e2eb4f81c29a0c}, issn = {0004-3702}, journal = {Artif. Intell.}, keywords = {information-extraction machine-learning unsupervised-learning web}, number = 1, pages = {91--134}, publisher = {Elsevier Science Publishers Ltd.}, timestamp = {2009-11-18T17:04:15.000+0100}, title = {Unsupervised named-entity extraction from the web: an experimental study}, url = {http://portal.acm.org/citation.cfm?id=1090483.1090487}, volume = 165, year = 2005 }

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Unsupervised named-entity extraction from the web: an experimental study

Abstract

Description

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Unsupervised named-entity extraction from the web: an experimental study

Abstract

Description

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Unsupervised named-entity extraction from the web: an experimental study

Comments and Reviews
(0)