копировать удалить добавить публикацию в буфер
Запись сообщества
посмотреть историю данной записи
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

A distributed look-up architecture for text mining applications using MapReduce

A. Balkir, I. Foster, и A. Rzhetsky. Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, стр. 59:1--59:11. New York, NY, USA, ACM, (2011)
DOI: 10.1145/2063384.2063463

Аннотация

We study text analysis algorithms that use global optimization methods to compute local characteristics that are consistent with properties of the entire corpus rather than computed locally based on exogenous parameters. In the iterative implementations that we consider, each step both reads and updates a database of parameter values. Motivated by a need for rapid analysis of large corpora, we have developed methods for efficient access to such databases on parallel computers. These methods combine Bloom filters, in-memory caches, and an HBase cluster to reduce communication costs greatly relative to simpler approaches that either fully distribute or fully replicate the database. We also describe how this method can be incorporated into the MapReduce programming model, and illustrate its use within phrase segmentation programs. Our design can achieve considerable run time, latency and storage space improvements relative to other methods. In one phrase segmentation application, we improve performance by a factor of six relative to an HBase-based implementation.

Линки и ресурсы

ключ BibTeX: Balkir:2011:DLA:2063384.2063463
тип записи: inproceedings
адрес: New York, NY, USA
название книги: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
год: 2011
страницы: 59:1--59:11
издательство: ACM
серии: SC '11
location: Seattle, Washington
acmid: 2063463
isbn: 978-1-4503-0771-0
numpages: 11
articleno: 59
DOI: 10.1145/2063384.2063463
url: http://doi.acm.org/10.1145/2063384.2063463

тэги

@ytyoun- тэги данного пользователя выделены

super.computing

Цитировать эту публикацию

@inproceedings{Balkir:2011:DLA:2063384.2063463, abstract = {We study text analysis algorithms that use global optimization methods to compute local characteristics that are consistent with properties of the entire corpus rather than computed locally based on exogenous parameters. In the iterative implementations that we consider, each step both reads and updates a database of parameter values. Motivated by a need for rapid analysis of large corpora, we have developed methods for efficient access to such databases on parallel computers. These methods combine Bloom filters, in-memory caches, and an HBase cluster to reduce communication costs greatly relative to simpler approaches that either fully distribute or fully replicate the database. We also describe how this method can be incorporated into the MapReduce programming model, and illustrate its use within phrase segmentation programs. Our design can achieve considerable run time, latency and storage space improvements relative to other methods. In one phrase segmentation application, we improve performance by a factor of six relative to an HBase-based implementation.}, acmid = {2063463}, added-at = {2012-08-16T04:53:48.000+0200}, address = {New York, NY, USA}, articleno = {59}, author = {Balkir, Atilla Soner and Foster, Ian and Rzhetsky, Andrey}, biburl = {https://www.bibsonomy.org/bibtex/2e4698b644e4d974f16505d502b671800/ytyoun}, booktitle = {Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis}, doi = {10.1145/2063384.2063463}, interhash = {8cf4cfee406b23bef11065ebbf273523}, intrahash = {e4698b644e4d974f16505d502b671800}, isbn = {978-1-4503-0771-0}, keywords = {super.computing}, location = {Seattle, Washington}, numpages = {11}, pages = {59:1--59:11}, publisher = {ACM}, series = {SC '11}, timestamp = {2012-08-16T04:53:48.000+0200}, title = {A distributed look-up architecture for text mining applications using MapReduce}, url = {http://doi.acm.org/10.1145/2063384.2063463}, year = 2011 }

искать в

Метаданные

Последнее изменение 12 лет назад
Создан 12 лет назад

Комментарии и рецензии
(0)

Комментарии, или рецензии отсутствуют. Вы можете их написать!