Inproceedings,

Thematic text clustering for domain specific language model adaptation

, and .
Automatic Speech Recognition and Understanding, 2003. ASRU '03. 2003 IEEE Workshop on, page 513- 518. (2003)
DOI: 10.1109/ASRU.2003.1318493

Abstract

We propose a new approach for thematic text clustering. The text clusters are used to generate domain specific language models in order to address the problem of language model adaptation. The method relies on a new discriminative n-gram based term selection process (n>l), which reduces the influence of the corpus inhomogeneity, and outputs only semantically focused n-grams as being the most representative key terms in the corpus. These key terms are then used to automatically cluster the whole document collection and generate LM out of these text clusters. Different key term selection methods are evaluated using perplexity as a measure. Automatically computed clusters are compared with manually assigned labelling according to genre information. The results of these experimental studies are presented and discussed. Compared to the manual clustering a significant performance improvement between 21.87 % and 53.12 % is observed depending on the chosen key term selection method.

Tags

Users

  • @msn

Comments and Reviews