You Can't Beat Frequency (Unless You Use Linguistic Knowledge) -- A Qualitative Evaluation of Association Measures for Collocation and Term Extraction
J. Wermter, and U. Hahn. 44th Annual Meeting of the Association for Computational Linguistics, page 785--792. Sydney, Australia, Association for Computational Linguistics, (July 2006)
Abstract
In the past years, a number of lexical association measures have been studied to help extract new scientific terminology or general-language collocations. The implicit assumption of this research was that newly designed term measures involving more sophisticated statistical criteria would outperform simple counts of cooccurrence frequencies. We here explicitly test this assumption. By way of four qualitative criteria, we show that purely statistics-based measures reveal virtually no difference compared with frequency of occurrence counts, while linguistically more informed metrics do reveal such a marked difference.
%0 Conference Paper
%1 WermterHahn06
%A Wermter, Joachim
%A Hahn, Udo
%B 44th Annual Meeting of the Association for Computational Linguistics
%C Sydney, Australia
%D 2006
%I Association for Computational Linguistics
%K collocation corpus extraction frequency lpm lsm ngram term
%P 785--792
%T You Can't Beat Frequency (Unless You Use Linguistic Knowledge) -- A Qualitative Evaluation of Association Measures for Collocation and Term Extraction
%U http://acl.ldc.upenn.edu/P/P06/P06-1099.pdf
%X In the past years, a number of lexical association measures have been studied to help extract new scientific terminology or general-language collocations. The implicit assumption of this research was that newly designed term measures involving more sophisticated statistical criteria would outperform simple counts of cooccurrence frequencies. We here explicitly test this assumption. By way of four qualitative criteria, we show that purely statistics-based measures reveal virtually no difference compared with frequency of occurrence counts, while linguistically more informed metrics do reveal such a marked difference.
@inproceedings{WermterHahn06,
abstract = {In the past years, a number of lexical association measures have been studied to help extract new scientific terminology or general-language collocations. The implicit assumption of this research was that newly designed term measures involving more sophisticated statistical criteria would outperform simple counts of cooccurrence frequencies. We here explicitly test this assumption. By way of four qualitative criteria, we show that purely statistics-based measures reveal virtually no difference compared with frequency of occurrence counts, while linguistically more informed metrics do reveal such a marked difference.},
added-at = {2008-03-11T18:22:21.000+0100},
address = {Sydney, Australia},
author = {Wermter, Joachim and Hahn, Udo},
biburl = {https://www.bibsonomy.org/bibtex/2a10f28fcbe73d29866932710c2a80047/goalscoringsuperstarhero},
booktitle = {44th Annual Meeting of the Association for Computational Linguistics},
interhash = {2eb930cfd69cfefa66f140363b777f48},
intrahash = {a10f28fcbe73d29866932710c2a80047},
keywords = {collocation corpus extraction frequency lpm lsm ngram term},
month = {July},
pages = {785--792},
publisher = {Association for Computational Linguistics},
timestamp = {2008-03-11T18:22:21.000+0100},
title = {You Can't Beat Frequency (Unless You Use Linguistic Knowledge) -- A Qualitative Evaluation of Association Measures for Collocation and Term Extraction},
url = {http://acl.ldc.upenn.edu/P/P06/P06-1099.pdf},
year = 2006
}