M. Jarmasz, and S. Szpakowicz. Conference on Recent Advances in Natural Language Processing, page 212--219. (2003)
Abstract
We have implemented a system that measures semantic similarity using
a computerized 1987 Roget's Thesaurus, and evaluated it by performing
a few typical tests. We compare the results of these tests with those
produced by WordNet-based similarity measures. One of the benchmarks
is Miller and Charles� list of 30 noun pairs to which human judges
had assigned similarity measures. We correlate these measures with
those computed by several NLP systems. The 30 pairs can be traced
back to Rubenstein and Goodenough�s 65 pairs, which we have also
studied. Our Roget�s-based system gets correlations of .878 for the
smaller and .818 for the larger list of noun pairs; this is quite
close to the .885 that Resnik obtained when he employed humans to
replicate the Miller and Charles experiment. We further evaluate
our measure by using Roget�s and WordNet to answer 80 TOEFL, 50 ESL
and 300 Reader�s Digest questions: the correct synonym must be selected
amongst a group of four words. Our system gets 78.75\%, 82.00\% and
74.33\% of the questions respectively.
%0 Conference Paper
%1 Jarmasz2003
%A Jarmasz, Mario
%A Szpakowicz, Stan
%B Conference on Recent Advances in Natural Language Processing
%D 2003
%K knowledge nlp ontology thesaurus
%P 212--219
%T Roget's thesaurus and semantic similarity
%U http://www.site.uottawa.ca/~mjarmasz/pubs/jarmasz_roget_sim.pdf
%X We have implemented a system that measures semantic similarity using
a computerized 1987 Roget's Thesaurus, and evaluated it by performing
a few typical tests. We compare the results of these tests with those
produced by WordNet-based similarity measures. One of the benchmarks
is Miller and Charles� list of 30 noun pairs to which human judges
had assigned similarity measures. We correlate these measures with
those computed by several NLP systems. The 30 pairs can be traced
back to Rubenstein and Goodenough�s 65 pairs, which we have also
studied. Our Roget�s-based system gets correlations of .878 for the
smaller and .818 for the larger list of noun pairs; this is quite
close to the .885 that Resnik obtained when he employed humans to
replicate the Miller and Charles experiment. We further evaluate
our measure by using Roget�s and WordNet to answer 80 TOEFL, 50 ESL
and 300 Reader�s Digest questions: the correct synonym must be selected
amongst a group of four words. Our system gets 78.75\%, 82.00\% and
74.33\% of the questions respectively.
@inproceedings{Jarmasz2003,
abstract = {We have implemented a system that measures semantic similarity using
a computerized 1987 Roget's Thesaurus, and evaluated it by performing
a few typical tests. We compare the results of these tests with those
produced by WordNet-based similarity measures. One of the benchmarks
is Miller and Charles� list of 30 noun pairs to which human judges
had assigned similarity measures. We correlate these measures with
those computed by several NLP systems. The 30 pairs can be traced
back to Rubenstein and Goodenough�s 65 pairs, which we have also
studied. Our Roget�s-based system gets correlations of .878 for the
smaller and .818 for the larger list of noun pairs; this is quite
close to the .885 that Resnik obtained when he employed humans to
replicate the Miller and Charles experiment. We further evaluate
our measure by using Roget�s and WordNet to answer 80 TOEFL, 50 ESL
and 300 Reader�s Digest questions: the correct synonym must be selected
amongst a group of four words. Our system gets 78.75\%, 82.00\% and
74.33\% of the questions respectively.},
added-at = {2008-05-20T11:16:21.000+0200},
author = {Jarmasz, Mario and Szpakowicz, Stan},
biburl = {https://www.bibsonomy.org/bibtex/2acde39a427ef0e7501f07e8b067a88f0/brightbyte},
booktitle = {Conference on Recent Advances in Natural Language Processing},
interhash = {e28cc3a4231e064f44cfdb2e3338aaf3},
intrahash = {acde39a427ef0e7501f07e8b067a88f0},
keywords = {knowledge nlp ontology thesaurus},
owner = {Marco},
pages = {212--219},
timestamp = {2009-01-23T09:58:50.000+0100},
title = {Roget's thesaurus and semantic similarity},
url = {http://www.site.uottawa.ca/~mjarmasz/pubs/jarmasz_roget_sim.pdf},
year = 2003
}