Telling English Tweets Apart: the Case of US, GB, AU
A. Hadgu, N. Lotze, и R. Jäschke. Proceedings of the Workshop on Natural Language Processing and Computational Social Science, Hannover, Germany, (мая 2016)
Аннотация
In this paper, we study how to automatically tell different varieties of English apart on Twitter by taking samples from American (US), British (GB) and Australian (AU) English. We track cities and apply filters to generate ground-truth data. We perform expert evaluation to get a sense of the difficulty of the task. We then cast the problem as a classification task: given a tweet (or a set of tweets from a user) in English, the goal is to automatically identify whether the tweet (or set of tweets) is US, GB or AU English. We perform experiments to compare some linguistic features against simple statistical features and show that character Ngrams are quite effective for the task.
%0 Conference Paper
%1 hadgu2016telling
%A Hadgu, Asmelash Teka
%A Lotze, Netaya
%A Jäschke, Robert
%B Proceedings of the Workshop on Natural Language Processing and Computational Social Science
%C Hannover, Germany
%D 2016
%K 2016 classification css ddm detection english feature_evaluation language mk5.4 myown nlp twitter variety
%T Telling English Tweets Apart: the Case of US, GB, AU
%X In this paper, we study how to automatically tell different varieties of English apart on Twitter by taking samples from American (US), British (GB) and Australian (AU) English. We track cities and apply filters to generate ground-truth data. We perform expert evaluation to get a sense of the difficulty of the task. We then cast the problem as a classification task: given a tweet (or a set of tweets from a user) in English, the goal is to automatically identify whether the tweet (or set of tweets) is US, GB or AU English. We perform experiments to compare some linguistic features against simple statistical features and show that character Ngrams are quite effective for the task.
@inproceedings{hadgu2016telling,
abstract = {In this paper, we study how to automatically tell different varieties of English apart on Twitter by taking samples from American (US), British (GB) and Australian (AU) English. We track cities and apply filters to generate ground-truth data. We perform expert evaluation to get a sense of the difficulty of the task. We then cast the problem as a classification task: given a tweet (or a set of tweets from a user) in English, the goal is to automatically identify whether the tweet (or set of tweets) is US, GB or AU English. We perform experiments to compare some linguistic features against simple statistical features and show that character Ngrams are quite effective for the task.},
added-at = {2016-04-20T09:39:02.000+0200},
address = {Hannover, Germany},
author = {Hadgu, Asmelash Teka and Lotze, Netaya and Jäschke, Robert},
biburl = {https://www.bibsonomy.org/bibtex/2aa3f06ca2ac7f1f1ac9c309d36875adc/jaeschke},
booktitle = {Proceedings of the Workshop on Natural Language Processing and Computational Social Science},
interhash = {43efaf96502cc3343b97fb5a1e233a5b},
intrahash = {aa3f06ca2ac7f1f1ac9c309d36875adc},
keywords = {2016 classification css ddm detection english feature_evaluation language mk5.4 myown nlp twitter variety},
month = may,
series = {NLP+CSS at WebSci},
timestamp = {2021-06-11T14:18:08.000+0200},
title = {Telling English Tweets Apart: the Case of US, GB, AU},
year = 2016
}