копировать удалить добавить публикацию в буфер
Запись сообщества
посмотреть историю данной записи
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Probabilistic author-topic models for information discovery

M. Steyvers, P. Smyth, M. Rosen-Zvi, и T. Griffiths. Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, (2004)

Аннотация

We propose a new unsupervised learning technique for ex- tracting information from large text collections. We model documents as if they were generated by a two-stage stochas- tic process. Each author is represented by a probability distribution over topics, and each topic is represented as a probability distribution over words for that topic. The words in a multi-author paper are assumed to be the result of a mixture of each authors’ topic mixture. The topic-word and author-topic distributions are learned from data in an unsupervised manner using a Markov chain Monte Carlo al- gorithm. We apply the methodology to a large corpus of 160,000 abstracts and 85,000 authors from the well-known CiteSeer digital library, and learn a model with 300 topics. We discuss in detail the interpretation of the results dis- covered by the system including specific topic and author models, ranking of authors by topic and topics by author, significant trends in the computer science literature between 1990 and 2002, parsing of abstracts by topics and authors and detection of unusual papers by specific authors. An on- line query interface to the model is also discussed that allows interactive exploration of author-topic models for corpora such as CiteSeer.

Описание

generative document model with latent author-topic vars

Линки и ресурсы

ключ BibTeX: steyvers2004pat
тип записи: article
год: 2004
журнал: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
страницы: 306--315
издательство: ACM New York, NY, USA
Document: http://irlab.cis.nctu.edu.tw/Presentation2005/%E9%99%B3%E4%BB%A5%E7%90%86/4_20/Probabilistic%20Author-Topic%20Models%20for%20Information%20discovery.pdf

тэги

@tberg- тэги данного пользователя выделены

Цитировать эту публикацию

искать в

Метаданные

Последнее изменение 16 лет назад
Создан 16 лет назад

Комментарии и рецензии
(0)

Комментарии, или рецензии отсутствуют. Вы можете их написать!