X. Liu, S. Zhang, F. Wei, и M. Zhou. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, стр. 359--367. Stroudsburg, PA, USA, Association for Computational Linguistics, (2011)
Аннотация
The challenges of Named Entities Recognition (NER) for tweets lie in the insufficient information in a tweet and the unavailability of training data. We propose to combine a K-Nearest Neighbors (KNN) classifier with a linear Conditional Random Fields (CRF) model under a semi-supervised learning framework to tackle these challenges. The KNN based classifier conducts pre-labeling to collect global coarse evidence across tweets while the CRF model conducts sequential labeling to capture fine-grained information encoded in a tweet. The semi-supervised learning plus the gazetteers alleviate the lack of training data. Extensive experiments show the advantages of our method over the baselines as well as the effectiveness of KNN and semi-supervised learning.
%0 Conference Paper
%1 Liu:2011:RNE:2002472.2002519
%A Liu, Xiaohua
%A Zhang, Shaodian
%A Wei, Furu
%A Zhou, Ming
%B Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
%C Stroudsburg, PA, USA
%D 2011
%I Association for Computational Linguistics
%K conceptExtraction entities named recognizing twitter
%P 359--367
%T Recognizing named entities in tweets
%U http://dl.acm.org/citation.cfm?id=2002472.2002519
%X The challenges of Named Entities Recognition (NER) for tweets lie in the insufficient information in a tweet and the unavailability of training data. We propose to combine a K-Nearest Neighbors (KNN) classifier with a linear Conditional Random Fields (CRF) model under a semi-supervised learning framework to tackle these challenges. The KNN based classifier conducts pre-labeling to collect global coarse evidence across tweets while the CRF model conducts sequential labeling to capture fine-grained information encoded in a tweet. The semi-supervised learning plus the gazetteers alleviate the lack of training data. Extensive experiments show the advantages of our method over the baselines as well as the effectiveness of KNN and semi-supervised learning.
%@ 978-1-932432-87-9
@inproceedings{Liu:2011:RNE:2002472.2002519,
abstract = {The challenges of Named Entities Recognition (NER) for tweets lie in the insufficient information in a tweet and the unavailability of training data. We propose to combine a K-Nearest Neighbors (KNN) classifier with a linear Conditional Random Fields (CRF) model under a semi-supervised learning framework to tackle these challenges. The KNN based classifier conducts pre-labeling to collect global coarse evidence across tweets while the CRF model conducts sequential labeling to capture fine-grained information encoded in a tweet. The semi-supervised learning plus the gazetteers alleviate the lack of training data. Extensive experiments show the advantages of our method over the baselines as well as the effectiveness of KNN and semi-supervised learning.},
acmid = {2002519},
added-at = {2013-01-26T17:37:16.000+0100},
address = {Stroudsburg, PA, USA},
author = {Liu, Xiaohua and Zhang, Shaodian and Wei, Furu and Zhou, Ming},
biburl = {https://www.bibsonomy.org/bibtex/2f45c1724ad1dd7f9862b8ed9f9d2faba/asmelash},
booktitle = {Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1},
interhash = {ad3f295fe10bb6af3d31408a5ee47e99},
intrahash = {f45c1724ad1dd7f9862b8ed9f9d2faba},
isbn = {978-1-932432-87-9},
keywords = {conceptExtraction entities named recognizing twitter},
location = {Portland, Oregon},
numpages = {9},
pages = {359--367},
publisher = {Association for Computational Linguistics},
series = {HLT '11},
timestamp = {2013-01-26T17:37:16.000+0100},
title = {Recognizing named entities in tweets},
url = {http://dl.acm.org/citation.cfm?id=2002472.2002519},
year = 2011
}