copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Scaling to very very large corpora for natural language disambiguation

M. Banko, and E. Brill. ACL '01: Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, page 26--33. Morristown, NJ, USA, Association for Computational Linguistics, (2001)
DOI: http://dx.doi.org/10.3115/1073012.1073017

Abstract

The amount of readily available on-line text has reached hundreds of billions of words and continues to grow. Yet for most core natural language tasks, algorithms continue to be optimized, tested and compared after training on corpora consisting of only one million words or less. In this paper, we evaluate the performance of different learning methods on a prototypical natural language disambiguation task, confusion set disambiguation, when trained on orders of magnitude more labeled data than has previously been used. We are fortunate that for this particular application, correctly labeled training data is free. Since this will often not be the case, we examine methods for effectively exploiting very large corpora when labeled data comes at a cost.

Description

Scaling to very very large corpora for natural language disambiguation

Links and resources

BibTeX key: Banko01
entry type: inproceedings
address: Morristown, NJ, USA
booktitle: ACL '01: Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
year: 2001
pages: 26--33
publisher: Association for Computational Linguistics
location: Toulouse, France
DOI: http://dx.doi.org/10.3115/1073012.1073017
url: http://portal.acm.org/citation.cfm?id=1073017

@mkroell's tags highlighted

Cite this publication

search on

Meta data

Last update 16 years ago
Created 16 years ago

Comments and Reviews
(0)

There is no review or comment yet. You can write one!

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Scaling to very very large corpora for natural language disambiguation

Abstract

Description

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Scaling to very very large corpora for natural language disambiguation

Abstract

Description

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Scaling to very very large corpora for natural language disambiguation

Comments and Reviews
(0)