Abstract

Stemming is a mandatory step in both web information and queries processing during retrieval process. And web user's need still hard to obtain from their short queries. That's why many reformulation and expansion paper attempted to fulfil this gap using different approach. All those work pass through standard text preparation steps like tokenising, stop-words cleaning and stemming. We aims to proof the effectiveness of those measures, that's why our work attempts to verify the impact of ignoring stemming preparation on dimension reduction techniques such as latent semantic analysis in context of query expansion terms selection within statistical latent semantic indexing. Finally we talk about the results from a corpus-linguistic point of view and discuss further improvements.

Links and resources

Tags