Unsupervised Methods for Determining Object and Relation Synonyms on the Web
A. Yates, and O. Etzioni. J. Artif. Int. Res., 34 (1):
255--296(March 2009)
Abstract
The task of identifying synonymous relations and objects, or synonym resolution, is critical for high-quality information extraction. This paper investigates synonym resolution in the context of unsupervised information extraction, where neither hand-tagged training examples nor domain knowledge is available. The paper presents a scalable, fully-implemented system that runs in O(KN log N) time in the number of extractions, N, and the maximum number of synonyms per word, K. The system, called RESOLVER, introduces a probabilistic relational model for predicting whether two strings are co-referential based on the similarity of the assertions containing them. On a set of two million assertions extracted from the Web, RESOLVER resolves objects with 78% precision and 68% recall, and resolves relations with 90% precision and 35% recall. Several variations of RESOLVER's probabilistic model are explored, and experiments demonstrate that under appropriate conditions these variations can improve F1 by 5%. An extension to the basic RESOLVER system allows it to handle polysemous names with 97% precision and 95% recall on a data set from the TREC corpus.
%0 Journal Article
%1 yates2009
%A Yates, Alexander
%A Etzioni, Oren
%C USA
%D 2009
%I AI Access Foundation
%J J. Artif. Int. Res.
%K 2009 extension extraction relation resolver textrunner unsupervised yates
%N 1
%P 255--296
%T Unsupervised Methods for Determining Object and Relation Synonyms on the Web
%U http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.178.3420&rep=rep1&type=pdf
%V 34
%X The task of identifying synonymous relations and objects, or synonym resolution, is critical for high-quality information extraction. This paper investigates synonym resolution in the context of unsupervised information extraction, where neither hand-tagged training examples nor domain knowledge is available. The paper presents a scalable, fully-implemented system that runs in O(KN log N) time in the number of extractions, N, and the maximum number of synonyms per word, K. The system, called RESOLVER, introduces a probabilistic relational model for predicting whether two strings are co-referential based on the similarity of the assertions containing them. On a set of two million assertions extracted from the Web, RESOLVER resolves objects with 78% precision and 68% recall, and resolves relations with 90% precision and 35% recall. Several variations of RESOLVER's probabilistic model are explored, and experiments demonstrate that under appropriate conditions these variations can improve F1 by 5%. An extension to the basic RESOLVER system allows it to handle polysemous names with 97% precision and 95% recall on a data set from the TREC corpus.
@article{yates2009,
abstract = {The task of identifying synonymous relations and objects, or synonym resolution, is critical for high-quality information extraction. This paper investigates synonym resolution in the context of unsupervised information extraction, where neither hand-tagged training examples nor domain knowledge is available. The paper presents a scalable, fully-implemented system that runs in O(KN log N) time in the number of extractions, N, and the maximum number of synonyms per word, K. The system, called RESOLVER, introduces a probabilistic relational model for predicting whether two strings are co-referential based on the similarity of the assertions containing them. On a set of two million assertions extracted from the Web, RESOLVER resolves objects with 78% precision and 68% recall, and resolves relations with 90% precision and 35% recall. Several variations of RESOLVER's probabilistic model are explored, and experiments demonstrate that under appropriate conditions these variations can improve F1 by 5%. An extension to the basic RESOLVER system allows it to handle polysemous names with 97% precision and 95% recall on a data set from the TREC corpus.},
acmid = {1622724},
added-at = {2014-01-05T16:40:56.000+0100},
address = {USA},
author = {Yates, Alexander and Etzioni, Oren},
biburl = {https://www.bibsonomy.org/bibtex/21bd33c34f6f19468897b9688ce792df9/jil},
interhash = {37c084ce21a33c31b5a62530145c5a08},
intrahash = {1bd33c34f6f19468897b9688ce792df9},
issn = {1076-9757},
issue_date = {January 2009},
journal = {J. Artif. Int. Res.},
keywords = {2009 extension extraction relation resolver textrunner unsupervised yates},
month = mar,
number = 1,
numpages = {42},
pages = {255--296},
publisher = {AI Access Foundation},
timestamp = {2014-01-05T16:43:21.000+0100},
title = {Unsupervised Methods for Determining Object and Relation Synonyms on the Web},
url = {http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.178.3420&rep=rep1&type=pdf},
volume = 34,
year = 2009
}