A practical web-based approach to generating topic hierarchy for text segments
S. Chuang, and L. Chien. CIKM '04: Proceedings of the thirteenth ACM conference on Information and knowledge management, page 127--136. New York, NY, USA, ACM Press, (2004)
DOI: 10.1145/1031171.1031193
Abstract
It is crucial in many information systems to organize short text segments, such as keywords in documents and queries from users, into a well-formed topic hierarchy. In this paper, we address the problem of generating topic hierarchies for diverse text segments with a general and practical approach that uses the Web as an additional knowledge source. Unlike long documents, short text segments typically do not contain enough information to extract reliable features. This work investigates the possibilities of using highly ranked search-result snippets to enrich the representation of text segments. A hierarchical clustering algorithm is then applied to create the hierarchical topic structure of text segments. Different from traditional clustering algorithms, which tend to produce cluster hierarchies with a very unnatural shape, the approach tries to produce a more natural and comprehensive hierarchy. Extensive experiments were conducted on different domains of text segments. The obtained results have shown the potential of the proposed approach, which is believed able to benefit many information systems.
"4.3 Cluster Naming
To provide users a more comprehensive hierarchy of clusters,
the internal nodes should be labeled with some concise
names. Although it is essential to label clusters, only a few
works really dealt with it 18, 13, 9. In Muller et al. 18, the
labels of a cluster were chosen as the n most frequent terms
in the cluster. Lawrie et al. 13 extracted salient words and
phrases of the instances in a cluster from retrieved documents
to organize them hierarchically using a type of cooccurrence
known as subsumption. Glover et al. 9 inferred
hierarchical relationships and descriptions by employing a
statistical model they created to distinguish between the
parent, self, and child features in a set of documents.
To name a cluster is a rather intellectual and challenging
work. As we mentioned, this work focuses on how to link the
clusters with close concepts and to decide appropriate levels
in the hierarchy to position them. The cluster naming is not
fully investigated in our current stage of study. We simply
take the most frequent co-occurred feature terms from the
composed instances to name the cluster. Even so, as being
illustrated in Figure 7, such a primitive approach still provides
an easier way for users to understand the concepts of
the generated cluster hierarchy."
%0 Conference Paper
%1 citeulike:266704
%A Chuang, Shui-Lung
%A Chien, Lee-Feng
%B CIKM '04: Proceedings of the thirteenth ACM conference on Information and knowledge management
%C New York, NY, USA
%D 2004
%I ACM Press
%K shorttext thesaurus
%P 127--136
%R 10.1145/1031171.1031193
%T A practical web-based approach to generating topic hierarchy for text segments
%U http://dx.doi.org/10.1145/1031171.1031193
%X It is crucial in many information systems to organize short text segments, such as keywords in documents and queries from users, into a well-formed topic hierarchy. In this paper, we address the problem of generating topic hierarchies for diverse text segments with a general and practical approach that uses the Web as an additional knowledge source. Unlike long documents, short text segments typically do not contain enough information to extract reliable features. This work investigates the possibilities of using highly ranked search-result snippets to enrich the representation of text segments. A hierarchical clustering algorithm is then applied to create the hierarchical topic structure of text segments. Different from traditional clustering algorithms, which tend to produce cluster hierarchies with a very unnatural shape, the approach tries to produce a more natural and comprehensive hierarchy. Extensive experiments were conducted on different domains of text segments. The obtained results have shown the potential of the proposed approach, which is believed able to benefit many information systems.
%@ 1581138741
@inproceedings{citeulike:266704,
abstract = {It is crucial in many information systems to organize short text segments, such as keywords in documents and queries from users, into a well-formed topic hierarchy. In this paper, we address the problem of generating topic hierarchies for diverse text segments with a general and practical approach that uses the Web as an additional knowledge source. Unlike long documents, short text segments typically do not contain enough information to extract reliable features. This work investigates the possibilities of using highly ranked search-result snippets to enrich the representation of text segments. A hierarchical clustering algorithm is then applied to create the hierarchical topic structure of text segments. Different from traditional clustering algorithms, which tend to produce cluster hierarchies with a very unnatural shape, the approach tries to produce a more natural and comprehensive hierarchy. Extensive experiments were conducted on different domains of text segments. The obtained results have shown the potential of the proposed approach, which is believed able to benefit many information systems.},
added-at = {2006-06-16T10:34:37.000+0200},
address = {New York, NY, USA},
author = {Chuang, Shui-Lung and Chien, Lee-Feng},
biburl = {https://www.bibsonomy.org/bibtex/27fad9ccefec8c09de2eabad2a219b5c0/ldietz},
booktitle = {CIKM '04: Proceedings of the thirteenth ACM conference on Information and knowledge management},
citeulike-article-id = {266704},
comment = {"4.3 Cluster Naming
To provide users a more comprehensive hierarchy of clusters,
the internal nodes should be labeled with some concise
names. Although it is essential to label clusters, only a few
works really dealt with it [18, 13, 9]. In Muller et al. [18], the
labels of a cluster were chosen as the n most frequent terms
in the cluster. Lawrie et al. [13] extracted salient words and
phrases of the instances in a cluster from retrieved documents
to organize them hierarchically using a type of cooccurrence
known as subsumption. Glover et al. [9] inferred
hierarchical relationships and descriptions by employing a
statistical model they created to distinguish between the
parent, self, and child features in a set of documents.
To name a cluster is a rather intellectual and challenging
work. As we mentioned, this work focuses on how to link the
clusters with close concepts and to decide appropriate levels
in the hierarchy to position them. The cluster naming is not
fully investigated in our current stage of study. We simply
take the most frequent co-occurred feature terms from the
composed instances to name the cluster. Even so, as being
illustrated in Figure 7, such a primitive approach still provides
an easier way for users to understand the concepts of
the generated cluster hierarchy."},
doi = {10.1145/1031171.1031193},
interhash = {a09b206839bc9e02b53756c639add24d},
intrahash = {7fad9ccefec8c09de2eabad2a219b5c0},
isbn = {1581138741},
keywords = {shorttext thesaurus},
pages = {127--136},
priority = {0},
publisher = {ACM Press},
timestamp = {2006-06-16T10:34:37.000+0200},
title = {A practical web-based approach to generating topic hierarchy for text segments},
url = {http://dx.doi.org/10.1145/1031171.1031193},
year = 2004
}