WikiWalk: random walks on Wikipedia for semantic relatedness
E. Yeh, D. Ramage, C. Manning, E. Agirre, and A. Soroa. Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing, page 41--49. Stroudsburg, PA, USA, Association for Computational Linguistics, (2009)
Abstract
Computing semantic relatedness of natural language texts is a key component of tasks such as information retrieval and summarization, and often depends on knowledge of a broad range of real-world concepts and relationships. We address this knowledge integration issue by computing semantic relatedness using personalized PageRank (random walks) on a graph derived from Wikipedia. This paper evaluates methods for building the graph, including link selection strategies, and two methods for representing input texts as distributions over the graph nodes: one based on a dictionary lookup, the other based on Explicit Semantic Analysis. We evaluate our techniques on standard word relatedness and text similarity datasets, finding that they capture similarity information complementary to existing Wikipedia-based relatedness measures, resulting in small improvements on a state-of-the-art measure.
%0 Conference Paper
%1 yeh2009wikiwalk
%A Yeh, Eric
%A Ramage, Daniel
%A Manning, Christopher D.
%A Agirre, Eneko
%A Soroa, Aitor
%B Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing
%C Stroudsburg, PA, USA
%D 2009
%I Association for Computational Linguistics
%K random relatedness semantics walks wikipedia
%P 41--49
%T WikiWalk: random walks on Wikipedia for semantic relatedness
%U http://dl.acm.org/citation.cfm?id=1708124.1708133
%X Computing semantic relatedness of natural language texts is a key component of tasks such as information retrieval and summarization, and often depends on knowledge of a broad range of real-world concepts and relationships. We address this knowledge integration issue by computing semantic relatedness using personalized PageRank (random walks) on a graph derived from Wikipedia. This paper evaluates methods for building the graph, including link selection strategies, and two methods for representing input texts as distributions over the graph nodes: one based on a dictionary lookup, the other based on Explicit Semantic Analysis. We evaluate our techniques on standard word relatedness and text similarity datasets, finding that they capture similarity information complementary to existing Wikipedia-based relatedness measures, resulting in small improvements on a state-of-the-art measure.
%@ 978-1-932432-54-1
@inproceedings{yeh2009wikiwalk,
abstract = {Computing semantic relatedness of natural language texts is a key component of tasks such as information retrieval and summarization, and often depends on knowledge of a broad range of real-world concepts and relationships. We address this knowledge integration issue by computing semantic relatedness using personalized PageRank (random walks) on a graph derived from Wikipedia. This paper evaluates methods for building the graph, including link selection strategies, and two methods for representing input texts as distributions over the graph nodes: one based on a dictionary lookup, the other based on Explicit Semantic Analysis. We evaluate our techniques on standard word relatedness and text similarity datasets, finding that they capture similarity information complementary to existing Wikipedia-based relatedness measures, resulting in small improvements on a state-of-the-art measure.},
acmid = {1708133},
added-at = {2017-01-12T08:49:58.000+0100},
address = {Stroudsburg, PA, USA},
author = {Yeh, Eric and Ramage, Daniel and Manning, Christopher D. and Agirre, Eneko and Soroa, Aitor},
biburl = {https://www.bibsonomy.org/bibtex/2ffd20a7357ca8e87d46e516589a7769e/thoni},
booktitle = {Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing},
description = {WikiWalk},
interhash = {8b28cd800b6ad3929eef3b45de997e51},
intrahash = {ffd20a7357ca8e87d46e516589a7769e},
isbn = {978-1-932432-54-1},
keywords = {random relatedness semantics walks wikipedia},
location = {Suntec, Singapore},
numpages = {9},
pages = {41--49},
publisher = {Association for Computational Linguistics},
series = {TextGraphs-4},
timestamp = {2017-01-12T08:49:58.000+0100},
title = {WikiWalk: random walks on Wikipedia for semantic relatedness},
url = {http://dl.acm.org/citation.cfm?id=1708124.1708133},
year = 2009
}