Beliebiger Eintrag,

Multi-GPU Distributed Parallel Bayesian Differential Topic Modelling

A. Li.
(2015)cite arxiv:1510.06549.

Zusammenfassung

There is an explosion of data, documents, and other content, and people require tools to analyze and interpret these, tools to turn the content into information and knowledge. Topic modeling have been developed to solve these problems. Topic models such as LDA Blei et. al. 2003 allow salient patterns in data to be extracted automatically. When analyzing texts, these patterns are called topics. Among numerous extensions of LDA, few of them can reliably analyze multiple groups of documents and extract topic similarities. Recently, the introduction of differential topic modeling (SPDP) Chen et. al. 2012 performs uniformly better than many topic models in a discriminative setting. There is also a need to improve the sampling speed for topic models. While some effort has been made for distributed algorithms, there is no work currently done using graphical processing units (GPU). Note the GPU framework has already become the most cost-efficient platform for many problems. In this thesis, I propose and implement a scalable multi-GPU distributed parallel framework which approximates SPDP. Through experiments, I have shown my algorithms have a gain in speed of about 50 times while being almost as accurate, with only one single cheap laptop GPU. Furthermore, I have shown the speed improvement is sublinearly scalable when multiple GPUs are used, while fairly maintaining the accuracy. Therefore on a medium-sized GPU cluster, the speed improvement could potentially reach a factor of a thousand. Note SPDP is just a representative of other extensions of LDA. Although my algorithm is implemented to work with SPDP, it is designed to be a general enough to work with other topic models. The speed-up on smaller collections (i.e., 1000s of documents), means that these more complex LDA extensions could now be done in real-time, thus opening up a new way of using these LDA models in industry.

BibTeX-Schlüssel: li2015multigpu
Eintragstyp: misc
Jahr: 2015
URL: http://arxiv.org/abs/1510.06549
Hinweis: cite arxiv:1510.06549

Nutzer

Kommentare und Rezensionenanzeigen / verbergen

Bitte melden Sie sich an um selbst Rezensionen oder Kommentare zu erstellen.

Zitieren Sie diese Publikation

%0 Generic %1 li2015multigpu %A Li, Aaron Q %D 2015 %K gpu lda %T Multi-GPU Distributed Parallel Bayesian Differential Topic Modelling %U http://arxiv.org/abs/1510.06549 %X There is an explosion of data, documents, and other content, and people require tools to analyze and interpret these, tools to turn the content into information and knowledge. Topic modeling have been developed to solve these problems. Topic models such as LDA Blei et. al. 2003 allow salient patterns in data to be extracted automatically. When analyzing texts, these patterns are called topics. Among numerous extensions of LDA, few of them can reliably analyze multiple groups of documents and extract topic similarities. Recently, the introduction of differential topic modeling (SPDP) Chen et. al. 2012 performs uniformly better than many topic models in a discriminative setting. There is also a need to improve the sampling speed for topic models. While some effort has been made for distributed algorithms, there is no work currently done using graphical processing units (GPU). Note the GPU framework has already become the most cost-efficient platform for many problems. In this thesis, I propose and implement a scalable multi-GPU distributed parallel framework which approximates SPDP. Through experiments, I have shown my algorithms have a gain in speed of about 50 times while being almost as accurate, with only one single cheap laptop GPU. Furthermore, I have shown the speed improvement is sublinearly scalable when multiple GPUs are used, while fairly maintaining the accuracy. Therefore on a medium-sized GPU cluster, the speed improvement could potentially reach a factor of a thousand. Note SPDP is just a representative of other extensions of LDA. Although my algorithm is implemented to work with SPDP, it is designed to be a general enough to work with other topic models. The speed-up on smaller collections (i.e., 1000s of documents), means that these more complex LDA extensions could now be done in real-time, thus opening up a new way of using these LDA models in industry.

@misc{li2015multigpu, abstract = {There is an explosion of data, documents, and other content, and people require tools to analyze and interpret these, tools to turn the content into information and knowledge. Topic modeling have been developed to solve these problems. Topic models such as LDA [Blei et. al. 2003] allow salient patterns in data to be extracted automatically. When analyzing texts, these patterns are called topics. Among numerous extensions of LDA, few of them can reliably analyze multiple groups of documents and extract topic similarities. Recently, the introduction of differential topic modeling (SPDP) [Chen et. al. 2012] performs uniformly better than many topic models in a discriminative setting. There is also a need to improve the sampling speed for topic models. While some effort has been made for distributed algorithms, there is no work currently done using graphical processing units (GPU). Note the GPU framework has already become the most cost-efficient platform for many problems. In this thesis, I propose and implement a scalable multi-GPU distributed parallel framework which approximates SPDP. Through experiments, I have shown my algorithms have a gain in speed of about 50 times while being almost as accurate, with only one single cheap laptop GPU. Furthermore, I have shown the speed improvement is sublinearly scalable when multiple GPUs are used, while fairly maintaining the accuracy. Therefore on a medium-sized GPU cluster, the speed improvement could potentially reach a factor of a thousand. Note SPDP is just a representative of other extensions of LDA. Although my algorithm is implemented to work with SPDP, it is designed to be a general enough to work with other topic models. The speed-up on smaller collections (i.e., 1000s of documents), means that these more complex LDA extensions could now be done in real-time, thus opening up a new way of using these LDA models in industry.}, added-at = {2017-01-20T15:37:40.000+0100}, author = {Li, Aaron Q}, biburl = {https://www.bibsonomy.org/bibtex/22a994168e71d840aad0c9e82c7c70fd5/albinzehe}, description = {[1510.06549] Multi-GPU Distributed Parallel Bayesian Differential Topic Modelling}, interhash = {7a48009614d5f4af9f16f3a3fe220052}, intrahash = {2a994168e71d840aad0c9e82c7c70fd5}, keywords = {gpu lda}, note = {cite arxiv:1510.06549}, timestamp = {2017-01-20T15:37:40.000+0100}, title = {Multi-GPU Distributed Parallel Bayesian Differential Topic Modelling}, url = {http://arxiv.org/abs/1510.06549}, year = 2015 }

BibSonomy

Multi-GPU Distributed Parallel Bayesian Differential Topic Modelling

Zusammenfassung

Tags

Nutzer

Kommentare und Rezensionenanzeigen / verbergen

Zitieren Sie diese Publikation

Mehr Zitationsstile

Suchen auf