Word representations: a simple and general method for semi-supervised learning
J. Turian, L. Ratinov, and Y. Bengio. Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, page 384--394. Stroudsburg, PA, USA, Association for Computational Linguistics, (2010)
Abstract
If we take an existing supervised NLP system, a simple and general way to improve accuracy is to use unsupervised word representations as extra word features. We evaluate Brown clusters, Collobert and Weston (2008) embeddings, and HLBL (Mnih & Hinton, 2009) embeddings of words on both NER and chunking. We use near state-of-the-art supervised baselines, and find that each of the three word representations improves the accuracy of these baselines. We find further improvements by combining different word representations. You can download our word features, for off-the-shelf use in existing NLP systems, as well as our code, here: http://metaoptimize.com/projects/wordreprs/
%0 Conference Paper
%1 Turian:2010:WRS:1858681.1858721
%A Turian, Joseph
%A Ratinov, Lev
%A Bengio, Yoshua
%B Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
%C Stroudsburg, PA, USA
%D 2010
%I Association for Computational Linguistics
%K brown cluster clustering embedding evaluation overview word
%P 384--394
%T Word representations: a simple and general method for semi-supervised learning
%U http://dl.acm.org/citation.cfm?id=1858721
%X If we take an existing supervised NLP system, a simple and general way to improve accuracy is to use unsupervised word representations as extra word features. We evaluate Brown clusters, Collobert and Weston (2008) embeddings, and HLBL (Mnih & Hinton, 2009) embeddings of words on both NER and chunking. We use near state-of-the-art supervised baselines, and find that each of the three word representations improves the accuracy of these baselines. We find further improvements by combining different word representations. You can download our word features, for off-the-shelf use in existing NLP systems, as well as our code, here: http://metaoptimize.com/projects/wordreprs/
@inproceedings{Turian:2010:WRS:1858681.1858721,
abstract = {If we take an existing supervised NLP system, a simple and general way to improve accuracy is to use unsupervised word representations as extra word features. We evaluate Brown clusters, Collobert and Weston (2008) embeddings, and HLBL (Mnih & Hinton, 2009) embeddings of words on both NER and chunking. We use near state-of-the-art supervised baselines, and find that each of the three word representations improves the accuracy of these baselines. We find further improvements by combining different word representations. You can download our word features, for off-the-shelf use in existing NLP systems, as well as our code, here: http://metaoptimize.com/projects/wordreprs/},
acmid = {1858721},
added-at = {2012-03-11T22:07:17.000+0100},
address = {Stroudsburg, PA, USA},
author = {Turian, Joseph and Ratinov, Lev and Bengio, Yoshua},
biburl = {https://www.bibsonomy.org/bibtex/27b5f1660ca5f16006f03beb505b668c4/jil},
booktitle = {Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics},
interhash = {cbf1daea6f2f5807235d8b584cd2a4de},
intrahash = {7b5f1660ca5f16006f03beb505b668c4},
keywords = {brown cluster clustering embedding evaluation overview word},
location = {Uppsala, Sweden},
numpages = {11},
pages = {384--394},
publisher = {Association for Computational Linguistics},
series = {ACL '10},
timestamp = {2013-11-23T20:11:51.000+0100},
title = {Word representations: a simple and general method for semi-supervised learning},
url = {http://dl.acm.org/citation.cfm?id=1858721},
year = 2010
}