copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

SaberLDA: Sparsity-Aware Learning of Topic Models on GPUs

K. Li, J. Chen, W. Chen, and J. Zhu. (2016)cite arxiv:1610.02496Comment: 13 pages, 12 figures.

Abstract

Latent Dirichlet Allocation (LDA) is a popular tool for analyzing discrete count data such as text and images. Applications require LDA to handle both large datasets and a large number of topics. Though distributed CPU systems have been used, GPU-based systems have emerged as a promising alternative because of the high computational power and memory bandwidth of GPUs. However, existing GPU-based LDA systems cannot support a large number of topics because they use algorithms on dense data structures whose time and space complexity is linear to the number of topics. In this paper, we propose SaberLDA, a GPU-based LDA system that implements a sparsity-aware algorithm to achieve sublinear time complexity and scales well to learn a large number of topics. To address the challenges introduced by sparsity, we propose a novel data layout, a new warp-based sampling kernel, and an efficient sparse count matrix updating algorithm that improves locality, makes efficient utilization of GPU warps, and reduces memory consumption. Experiments show that SaberLDA can learn from billions-token-scale data with up to 10,000 topics, which is almost two orders of magnitude larger than that of the previous GPU-based systems. With a single GPU card, SaberLDA is able to learn 10,000 topics from a dataset of billions of tokens in a few hours, which is only achievable with clusters with tens of machines before.

Description

[1610.02496] SaberLDA: Sparsity-Aware Learning of Topic Models on GPUs

Links and resources

BibTeX key: li2016saberlda
entry type: misc
year: 2016
url: http://arxiv.org/abs/1610.02496
note: cite arxiv:1610.02496Comment: 13 pages, 12 figures

Cite this publication

@misc{li2016saberlda, abstract = {Latent Dirichlet Allocation (LDA) is a popular tool for analyzing discrete count data such as text and images. Applications require LDA to handle both large datasets and a large number of topics. Though distributed CPU systems have been used, GPU-based systems have emerged as a promising alternative because of the high computational power and memory bandwidth of GPUs. However, existing GPU-based LDA systems cannot support a large number of topics because they use algorithms on dense data structures whose time and space complexity is linear to the number of topics. In this paper, we propose SaberLDA, a GPU-based LDA system that implements a sparsity-aware algorithm to achieve sublinear time complexity and scales well to learn a large number of topics. To address the challenges introduced by sparsity, we propose a novel data layout, a new warp-based sampling kernel, and an efficient sparse count matrix updating algorithm that improves locality, makes efficient utilization of GPU warps, and reduces memory consumption. Experiments show that SaberLDA can learn from billions-token-scale data with up to 10,000 topics, which is almost two orders of magnitude larger than that of the previous GPU-based systems. With a single GPU card, SaberLDA is able to learn 10,000 topics from a dataset of billions of tokens in a few hours, which is only achievable with clusters with tens of machines before.}, added-at = {2017-01-20T15:37:22.000+0100}, author = {Li, Kaiwei and Chen, Jianfei and Chen, Wenguang and Zhu, Jun}, biburl = {https://www.bibsonomy.org/bibtex/2670808c832bcc2e65b24c2ca691ff76f/albinzehe}, description = {[1610.02496] SaberLDA: Sparsity-Aware Learning of Topic Models on GPUs}, interhash = {58c92456516fa085d8282307e7b9bab8}, intrahash = {670808c832bcc2e65b24c2ca691ff76f}, keywords = {gpu lda}, note = {cite arxiv:1610.02496Comment: 13 pages, 12 figures}, timestamp = {2017-01-20T15:37:22.000+0100}, title = {SaberLDA: Sparsity-Aware Learning of Topic Models on GPUs}, url = {http://arxiv.org/abs/1610.02496}, year = 2016 }

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

SaberLDA: Sparsity-Aware Learning of Topic Models on GPUs

Abstract

Description

Links and resources

Tags

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML SaberLDA: Sparsity-Aware Learning of Topic Models on GPUs

Abstract

Description

Links and resources

Tags

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

SaberLDA: Sparsity-Aware Learning of Topic Models on GPUs

Comments and Reviews
(0)