Automatically creating datasets for measures of semantic relatedness

Abstract

Semantic relatedness is a special form of linguistic distance between words. Evaluating semantic relatedness measures is usually performed by comparison with human judgments. Previous test datasets had been created analytically and were limited in size. We propose a corpus-based system for automatically creating test datasets. Experiments with human subjects show that the resulting datasets cover all degrees of relatedness. As a result of the corpus-based approach, test datasets cover all types of lexical-semantic relations and contain domain-specific words naturally occurring in texts.

BibTeX key: Zesch:2006:ACD:1641976.1641980
entry type: inproceedings
address: Stroudsburg, PA, USA
booktitle: Proceedings of the Workshop on Linguistic Distances
year: 2006
pages: 16--24
publisher: Association for Computational Linguistics
series: LD '06
location: Sydney, Australia
acmid: 1641980
isbn: 1-932432-83-3
numpages: 9
url: http://dl.acm.org/citation.cfm?id=1641976.1641980

BibSonomy

Automatically creating datasets for measures of semantic relatedness

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on