T. Griffiths, and M. Steyvers. Proceedings of the National academy of Sciences, 101 (suppl 1):
5228--5235(2004)
Abstract
A first step in identifying the content of a document is determining
which topics that document addresses. We describe a generative
model for documents, introduced by Blei, Ng, and Jordan Blei,
D. M., Ng, A. Y. & Jordan, M. I. (2003)
J. Machine Learn. Res.
3,
993-1022, in which each document is generated by choosing a
distribution over topics and then choosing each word in the
document from a topic selected according to this distribution. We
then present a Markov chain Monte Carlo algorithm for inference
in this model. We use this algorithm to analyze abstracts from
PNAS by using Bayesian model selection to establish the number of
topics. We show that the extracted topics capture meaningful
structure in the data, consistent with the class designations pro-
vided by the authors of the articles, and outline further applica-
tions of this analysis, including identifying ‘‘hot topics’’ by exam-
ining temporal dynamics and tagging abstracts to illustrate
semantic content.
%0 Journal Article
%1 griffiths2004finding
%A Griffiths, Thomas L
%A Steyvers, Mark
%D 2004
%I National Acad Sciences
%J Proceedings of the National academy of Sciences
%K modeling topic
%N suppl 1
%P 5228--5235
%T Finding scientific topics
%U http://www.pnas.org/content/101/suppl_1/5228.full.pdf
%V 101
%X A first step in identifying the content of a document is determining
which topics that document addresses. We describe a generative
model for documents, introduced by Blei, Ng, and Jordan Blei,
D. M., Ng, A. Y. & Jordan, M. I. (2003)
J. Machine Learn. Res.
3,
993-1022, in which each document is generated by choosing a
distribution over topics and then choosing each word in the
document from a topic selected according to this distribution. We
then present a Markov chain Monte Carlo algorithm for inference
in this model. We use this algorithm to analyze abstracts from
PNAS by using Bayesian model selection to establish the number of
topics. We show that the extracted topics capture meaningful
structure in the data, consistent with the class designations pro-
vided by the authors of the articles, and outline further applica-
tions of this analysis, including identifying ‘‘hot topics’’ by exam-
ining temporal dynamics and tagging abstracts to illustrate
semantic content.
@article{griffiths2004finding,
abstract = {A first step in identifying the content of a document is determining
which topics that document addresses. We describe a generative
model for documents, introduced by Blei, Ng, and Jordan [Blei,
D. M., Ng, A. Y. & Jordan, M. I. (2003)
J. Machine Learn. Res.
3,
993-1022], in which each document is generated by choosing a
distribution over topics and then choosing each word in the
document from a topic selected according to this distribution. We
then present a Markov chain Monte Carlo algorithm for inference
in this model. We use this algorithm to analyze abstracts from
PNAS by using Bayesian model selection to establish the number of
topics. We show that the extracted topics capture meaningful
structure in the data, consistent with the class designations pro-
vided by the authors of the articles, and outline further applica-
tions of this analysis, including identifying ‘‘hot topics’’ by exam-
ining temporal dynamics and tagging abstracts to illustrate
semantic content.},
added-at = {2016-10-20T22:49:06.000+0200},
author = {Griffiths, Thomas L and Steyvers, Mark},
biburl = {https://www.bibsonomy.org/bibtex/253ebf2a33807912c4c29dd99e9b553d3/huiyangsfsu},
interhash = {387a5060792d52ea73b02dd68e52559e},
intrahash = {53ebf2a33807912c4c29dd99e9b553d3},
journal = {Proceedings of the National academy of Sciences},
keywords = {modeling topic},
number = {suppl 1},
pages = {5228--5235},
publisher = {National Acad Sciences},
timestamp = {2016-10-20T22:49:06.000+0200},
title = {Finding scientific topics},
url = {http://www.pnas.org/content/101/suppl_1/5228.full.pdf},
volume = 101,
year = 2004
}