@lillejul

A Comparison of String Distance Metrics for Name-Matching Tasks.

, , and . Proceedings of IJCAI-03 Workshop on Information Integration, page 73--78. (August 2003)

Abstract

Using an open-source, Java toolkit of name-matching methods, we experimentally compare string distance metrics on the task of matching entity names. We investigate a number of different metrics proposed by different communities, including edit-distance metrics, fast heuristic string comparators , token-based distance metrics, and hybrid methods. Overall, the best-performing method is a hybrid scheme combining a TFIDF weighting scheme, which is widely used in information retrieval, with the Jaro-Winkler string-distance scheme, which was developed in the probabilistic record linkage community.

Links and resources

Tags

community

  • @asalber
  • @dice-research
  • @aksw
  • @lillejul
  • @evabl444
  • @sam_chapman
  • @mchaves
  • @dblp
  • @ljiang
@lillejul's tags highlighted