Inproceedings,

Thematic text clustering for domain specific language model adaptation

Z. Valsan, and M. Emele.
Automatic Speech Recognition and Understanding, 2003. ASRU '03. 2003 IEEE Workshop on, page 513- 518. (2003)
DOI: 10.1109/ASRU.2003.1318493

Abstract

We propose a new approach for thematic text clustering. The text clusters are used to generate domain specific language models in order to address the problem of language model adaptation. The method relies on a new discriminative n-gram based term selection process (n>l), which reduces the influence of the corpus inhomogeneity, and outputs only semantically focused n-grams as being the most representative key terms in the corpus. These key terms are then used to automatically cluster the whole document collection and generate LM out of these text clusters. Different key term selection methods are evaluated using perplexity as a measure. Automatically computed clusters are compared with manually assigned labelling according to genre information. The results of these experimental studies are presented and discussed. Compared to the manual clustering a significant performance improvement between 21.87 % and 53.12 % is observed depending on the chosen key term selection method.

BibTeX key: valsan03clustering
entry type: inproceedings
booktitle: Automatic Speech Recognition and Understanding, 2003. ASRU '03. 2003 IEEE Workshop on
year: 2003
pages: 513- 518
file: valsan03clustering.pdf:papers\\valsan03clustering.pdf:PDF
isbn: 0-7803-7980-2
DOI: 10.1109/ASRU.2003.1318493
url: http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1318493

BibSonomy

Thematic text clustering for domain specific language model adaptation

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on