@diego_ma

Semantic text similarity using corpus-based word similarity and string similarity

, and . ACM Transactions on Knowledge Discovery from Data, 2 (2): 1--25 (July 2008)
DOI: http://dx.doi.org/10.1145/1376815.1376819

Abstract

We present a method for measuring the semantic similarity of texts using a corpus-based measure of semantic word similarity and a normalized and modified version of the Longest Common Subsequence (LCS) string matching algorithm. Existing methods for computing text similarity have focused mainly on either large documents or individual words. We focus on computing the similarity between two sentences or two short paragraphs. The proposed method can be exploited in a variety of applications involving textual knowledge representation and knowledge discovery. Evaluation results on two different data sets show that our method outperforms several competing methods.

Links and resources

Tags

community

  • @diego_ma
  • @dblp
  • @jamesh
@diego_ma's tags highlighted