Kopieren Löschen Diese Publikation zur Ablage hinzufügen
Community-Eintrag
Versionsverlauf dieses Eintrags
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Probabilistic author-topic models for information discovery

M. Steyvers, P. Smyth, M. Rosen-Zvi, und T. Griffiths. KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, Seite 306--315. New York, NY, USA, ACM Press, (2004)
DOI: 10.1145/1014052.1014087

Zusammenfassung

We propose a new unsupervised learning technique for extracting information from large text collections. We model documents as if they were generated by a two-stage stochastic process. Each author is represented by a probability distribution over topics, and each topic is represented as a probability distribution over words for that topic. The words in a multi-author paper are assumed to be the result of a mixture of each authors' topic mixture. The topic-word and author-topic distributions are learned from data in an unsupervised manner using a Markov chain Monte Carlo algorithm. We apply the methodology to a large corpus of 160,000 abstracts and 85,000 authors from the well-known CiteSeer digital library, and learn a model with 300 topics. We discuss in detail the interpretation of the results discovered by the system including specific topic and author models, ranking of authors by topic and topics by author, significant trends in the computer science literature between 1990 and 2002, parsing of abstracts by topics and authors and detection of unusual papers by specific authors. An online query interface to the model is also discussed that allows interactive exploration of author-topic models for corpora such as CiteSeer.

Links und Ressourcen

BibTeX-Schlüssel: citeulike:378119
Eintragstyp: inproceedings
Adresse: New York, NY, USA
Buchtitel: KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Jahr: 2004
Seiten: 306--315
Verlag: ACM Press
citeulike-article-id: 378119
priority: 0
isbn: 1581138889
comment: same authors as Finding Scientific Topics (PNAS) cited by The Author-Recipient-Topic Model for Topic and Role Discovery in Social Networks: Experiments with Enron and Academic Email by Mccallum A, Corrada-Emmanuel A, Wang X http://www.citeulike.org/user/ldietz/article/344908 --- more from the application side. For mathematical details see http://www.citeulike.org/user/ldietz/article/383001
DOI: 10.1145/1014052.1014087
URL: http://dx.doi.org/10.1145/1014052.1014087

@renews Tags hervorgehoben

Zitieren Sie diese Publikation

@inproceedings{citeulike:378119, abstract = {We propose a new unsupervised learning technique for extracting information from large text collections. We model documents as if they were generated by a two-stage stochastic process. Each author is represented by a probability distribution over topics, and each topic is represented as a probability distribution over words for that topic. The words in a multi-author paper are assumed to be the result of a mixture of each authors' topic mixture. The topic-word and author-topic distributions are learned from data in an unsupervised manner using a Markov chain Monte Carlo algorithm. We apply the methodology to a large corpus of 160,000 abstracts and 85,000 authors from the well-known CiteSeer digital library, and learn a model with 300 topics. We discuss in detail the interpretation of the results discovered by the system including specific topic and author models, ranking of authors by topic and topics by author, significant trends in the computer science literature between 1990 and 2002, parsing of abstracts by topics and authors and detection of unusual papers by specific authors. An online query interface to the model is also discussed that allows interactive exploration of author-topic models for corpora such as CiteSeer.}, added-at = {2008-02-22T02:35:49.000+0100}, address = {New York, NY, USA}, author = {Steyvers, Mark and Smyth, Padhraic and Rosen-Zvi, Michal and Griffiths, Thomas}, biburl = {https://www.bibsonomy.org/bibtex/220fb4bab61662864357a9edf960a9b9b/renew}, booktitle = {KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining}, citeulike-article-id = {378119}, comment = {same authors as Finding Scientific Topics (PNAS) cited by The Author-Recipient-Topic Model for Topic and Role Discovery in Social Networks: Experiments with Enron and Academic Email by Mccallum A, Corrada-Emmanuel A, Wang X http://www.citeulike.org/user/ldietz/article/344908 --- more from the application side. For mathematical details see http://www.citeulike.org/user/ldietz/article/383001}, doi = {10.1145/1014052.1014087}, interhash = {b80d5948a7089aa63ce0f7d349c5ab85}, intrahash = {20fb4bab61662864357a9edf960a9b9b}, isbn = {1581138889}, keywords = {topic}, pages = {306--315}, priority = {0}, publisher = {ACM Press}, timestamp = {2008-02-26T13:44:33.000+0100}, title = {Probabilistic author-topic models for information discovery}, url = {http://dx.doi.org/10.1145/1014052.1014087}, year = 2004 }

BibSonomy

Kopieren Löschen Diese Publikation zur Ablage hinzufügen
Community-Eintrag
Versionsverlauf dieses Eintrags
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Probabilistic author-topic models for information discovery

Zusammenfassung

Links und Ressourcen

Tags

Community

Zitieren Sie diese Publikation

Mehr Zitationsstile

Suchen auf

Metadaten

Kommentare und Rezensionen
(0)

BibSonomy

KopierenLöschenDiese Publikation zur Ablage hinzufügenCommunity-EintragVersionsverlauf dieses EintragsURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Probabilistic author-topic models for information discovery

Zusammenfassung

Links und Ressourcen

Tags

Community

Zitieren Sie diese Publikation

Mehr Zitationsstile

Suchen auf

Metadaten

Kommentare und Rezensionen (0)

Kopieren Löschen Diese Publikation zur Ablage hinzufügen
Community-Eintrag
Versionsverlauf dieses Eintrags
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Probabilistic author-topic models for information discovery

Kommentare und Rezensionen
(0)