@jaeschke

Semantic similarity based on corpus statistics and lexical taxonomy

, and . Proceedings of the International Conference on Research in Computational Linguistics (ROCLING), Taiwan, (1997)

Abstract

This paper presents a new approach for measuring semantic similarity/distance betweenwords and concepts. It combines a lexical taxonomy structure with corpus statisticalinformation so that the semantic distance between nodes in the semantic space constructedby the taxonomy can be better quantified with the computational evidence derived from adistributional analysis of corpus data. Specifically, the proposed measure is a combinedapproach that inherits the edge-based approach of the edge counting scheme, which is thenenhanced by the node-based approach of the information content calculation. When testedon a common data set of word pair similarity ratings, the proposed approach outperformsother computational models. It gives the highest correlation value (r = 0.828) with abenchmark based on human similarity judgements, whereas an upper bound (r = 0.885) isobserved when human subjects replicate the same task.

Description

Jiang Conrath Maß

Links and resources

Tags

community