@seandalai

Creating Robust Supervised Classifiers via Web-Scale N-gram Data

, , and . Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL-10), (2010)

Abstract

In this paper, we systematically assess the value of using web-scale N-gram data in state-of-the-art supervised NLP classifiers. We compare classifiers that include or exclude features for the counts of various N-grams, where the counts are obtained from a web-scale auxiliary corpus. We show that including N-gram count features can advance the state-of-the-art accuracy on standard data sets for adjective ordering, spelling correction, noun compound bracketing, and verb part-of-speech disambiguation. More importantly, when operating on new domains, or when labeled training data is not plentiful, we show that using web-scale N-gram features is essential for achieving robust performance.

Links and resources

Tags

community

  • @dblp
  • @seandalai
@seandalai's tags highlighted