@chriskoerner

Using Web Search Logs to Identify Query Classification Terms

, , and . ITNG '07: Proceedings of the International Conference on Information Technology, page 469--474. Washington, DC, USA, IEEE Computer Society, (2007)
DOI: http://dx.doi.org/10.1109/ITNG.2007.202

Abstract

Classification of search queries is a complex and computationally challenging task. Typically, search queries are short, reveal very few features per single query and are therefore a weak source for traditional machine learning. In this paper, we present a method that combines limited manual labeling, computational linguistics and information retrieval to classify a large collection of web search queries. A short set of manually chosen terms that are known a priori to be of interest to a particular class is used to cull a small number of actual queries from a commercial search engine log. These queries are then submitted to a commercial search engine and the returned search results are used to find more class related terms. We examine classification proficiency of the proposed method on a large web search engine query log and show that up to 48% of the unlabeled set could be classified using this method. We discuss results of this research and its implications on the advancement of short text classification.

Description

Using Web Search Logs to Identify Query Classification Terms

Links and resources

Tags

community

  • @chriskoerner
  • @dblp
@chriskoerner's tags highlighted