@ldietz

A practical web-based approach to generating topic hierarchy for text segments

, and . CIKM '04: Proceedings of the thirteenth ACM conference on Information and knowledge management, page 127--136. New York, NY, USA, ACM Press, (2004)
DOI: 10.1145/1031171.1031193

Abstract

It is crucial in many information systems to organize short text segments, such as keywords in documents and queries from users, into a well-formed topic hierarchy. In this paper, we address the problem of generating topic hierarchies for diverse text segments with a general and practical approach that uses the Web as an additional knowledge source. Unlike long documents, short text segments typically do not contain enough information to extract reliable features. This work investigates the possibilities of using highly ranked search-result snippets to enrich the representation of text segments. A hierarchical clustering algorithm is then applied to create the hierarchical topic structure of text segments. Different from traditional clustering algorithms, which tend to produce cluster hierarchies with a very unnatural shape, the approach tries to produce a more natural and comprehensive hierarchy. Extensive experiments were conducted on different domains of text segments. The obtained results have shown the potential of the proposed approach, which is believed able to benefit many information systems.

Links and resources

Tags

community

  • @ldietz
  • @dblp
@ldietz's tags highlighted