An evaluation of phrasal and clustered representations on a text categorization task
D. Lewis. SIGIR '92: Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval, page 37--50. New York, NY, USA, ACM Press, (1992)
DOI: http://doi.acm.org/10.1145/133160.133172
Abstract
Syntactic phrase indexing and term clustering have been widely explored as text representation techniques for text retrieval. In this paper we study the properties of phrasal and clustered indexing languages on a text categorization task, enabling us to study their properties in isolation from query interpretation issues. We show that optimal effectiveness occurs when using only a small proportion of the indexing terms available, and that effectiveness peaks at a higher feature set size and lower effectiveness level for a syntactic phrase indexing than for word-based indexing. We also present results suggesting that traditional term clustering method are unlikely to provide significantly improved text representations. An improved probabilistic text categorization method is also presented.
Description
An evaluation of phrasal and clustered representations on a text categorization task
%0 Conference Paper
%1 lewis1992anevaluation
%A Lewis, David D.
%B SIGIR '92: Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
%C New York, NY, USA
%D 1992
%I ACM Press
%K classification imported msc representation
%P 37--50
%R http://doi.acm.org/10.1145/133160.133172
%T An evaluation of phrasal and clustered representations on a text categorization task
%U http://portal.acm.org/citation.cfm?id=133172&dl=GUIDE,
%X Syntactic phrase indexing and term clustering have been widely explored as text representation techniques for text retrieval. In this paper we study the properties of phrasal and clustered indexing languages on a text categorization task, enabling us to study their properties in isolation from query interpretation issues. We show that optimal effectiveness occurs when using only a small proportion of the indexing terms available, and that effectiveness peaks at a higher feature set size and lower effectiveness level for a syntactic phrase indexing than for word-based indexing. We also present results suggesting that traditional term clustering method are unlikely to provide significantly improved text representations. An improved probabilistic text categorization method is also presented.
%@ 0-89791-523-2
@inproceedings{lewis1992anevaluation,
abstract = {Syntactic phrase indexing and term clustering have been widely explored as text representation techniques for text retrieval. In this paper we study the properties of phrasal and clustered indexing languages on a text categorization task, enabling us to study their properties in isolation from query interpretation issues. We show that optimal effectiveness occurs when using only a small proportion of the indexing terms available, and that effectiveness peaks at a higher feature set size and lower effectiveness level for a syntactic phrase indexing than for word-based indexing. We also present results suggesting that traditional term clustering method are unlikely to provide significantly improved text representations. An improved probabilistic text categorization method is also presented.},
added-at = {2007-08-22T21:25:04.000+0200},
address = {New York, NY, USA},
author = {Lewis, David D.},
biburl = {https://www.bibsonomy.org/bibtex/24f720389c327a642a26c8b2006b0d384/ngrandy},
booktitle = {SIGIR '92: Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval},
description = {An evaluation of phrasal and clustered representations on a text categorization task},
doi = {http://doi.acm.org/10.1145/133160.133172},
interhash = {0462924cc160ca5bbed894019ce5cd47},
intrahash = {4f720389c327a642a26c8b2006b0d384},
isbn = {0-89791-523-2},
keywords = {classification imported msc representation},
location = {Copenhagen, Denmark},
pages = {37--50},
publisher = {ACM Press},
timestamp = {2007-08-22T21:25:04.000+0200},
title = {An evaluation of phrasal and clustered representations on a text categorization task},
url = {http://portal.acm.org/citation.cfm?id=133172&dl=GUIDE,},
year = 1992
}