copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Smoothed Bloom Filter Language Models: Tera-Scale LMs on the Cheap

D. Talbot, and M. Osborne. Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), page 468--476. Prague, Czech Republic, Association for Computational Linguistics, (June 2007)

Abstract

A Bloom filter (BF) is a randomised data structure for set membership queries. Its space requirements fall significantly below lossless information-theoretic lower bounds but it produces false positives with some quantifiable probability. Here we present a general framework for deriving smoothed language model probabilities from BFs. We investigate how a BF containing n-gram statistics can be used as a direct replacement for a conventional n-gram model. Recent work has demonstrated that corpus statistics can be stored efficiently within a BF, here we consider how smoothed language model probabilities can be derived efficiently from this randomised representation. Our pro- posal takes advantage of the one-sided error guarantees of the BF and simple inequali- ties that hold between related n-gram statis- tics in order to further reduce the BF stor- age requirements and the error rate of the derived probabilities. We use these models as replacements for a conventional language model in machine translation experiments.

Links and resources

BibTeX key: talbot-osborne:2007:EMNLP-CoNLL2007
entry type: inproceedings
address: Prague, Czech Republic
booktitle: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)
year: 2007
month: June
pages: 468--476
publisher: Association for Computational Linguistics
url: http://www.aclweb.org/anthology/D/D07/D07-1049

@jjv's tags highlighted

Cite this publication

search on

Meta data

Last update 16 years ago
Created 16 years ago

Comments and Reviews
(0)

There is no review or comment yet. You can write one!

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Smoothed Bloom Filter Language Models: Tera-Scale LMs on the Cheap

Abstract

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Smoothed Bloom Filter Language Models: Tera-Scale LMs on the Cheap

Abstract

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Smoothed Bloom Filter Language Models: Tera-Scale LMs on the Cheap

Comments and Reviews
(0)