Summarizing Topical Content with Word Frequency and Exclusivity

Abstract

Recent work in text analysis commonly describes topics in terms of their most frequent words, but the exclusivity of words to topics is equally important for communicating content. We introduce Hierarchical Poisson Convolution (HPC), a model which infers regularized estimates of the differential use of words across topics as well as their frequency within topics. HPC uses known hierarchical structure on human-labeled topics to make focused comparisons of differential usage within each branch of the hierarchy of labels. We then infer a summary for each topic in terms of words that are both frequent and exclusive. We develop a parallelized Hamiltonian Monte Carlo sampler that allows for fast and scalable computation.

BibTeX key: 10.5555/3042573.3042578
entry type: inproceedings
address: Madison, WI, USA
booktitle: Proceedings of the 29th International Coference on International Conference on Machine Learning
year: 2012
pages: 9–16
publisher: Omnipress
series: ICML'12
isbn: 9781450312851
numpages: 8
location: Edinburgh, Scotland

BibSonomy

Summarizing Topical Content with Word Frequency and Exclusivity

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on