A Latent Variable Model for Geographic Lexical Variation
J. Eisenstein, B. O'Connor, N. Smith, and E. Xing. Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, page 1277--1287. Stroudsburg, PA, USA, Association for Computational Linguistics, (2010)
Abstract
The rapid growth of geotagged social media raises new computational possibilities for investigating geographic linguistic variation. In this paper, we present a multi-level generative model that reasons jointly about latent topics and geographical regions. High-level topics such as "sports" or "entertainment" are rendered differently in each geographic region, revealing topic-specific regional distinctions. Applied to a new dataset of geotagged microblogs, our model recovers coherent topics and their regional variants, while identifying geographic areas of linguistic consistency. The model also enables prediction of an author's geographic location from raw text, outperforming both text regression and supervised topic models.
Description
A latent variable model for geographic lexical variation
%0 Conference Paper
%1 Eisenstein:2010:LVM:1870658.1870782
%A Eisenstein, Jacob
%A O'Connor, Brendan
%A Smith, Noah A.
%A Xing, Eric P.
%B Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
%C Stroudsburg, PA, USA
%D 2010
%I Association for Computational Linguistics
%K english language language-twitter lexical twitter
%P 1277--1287
%T A Latent Variable Model for Geographic Lexical Variation
%U http://dl.acm.org/citation.cfm?id=1870658.1870782
%X The rapid growth of geotagged social media raises new computational possibilities for investigating geographic linguistic variation. In this paper, we present a multi-level generative model that reasons jointly about latent topics and geographical regions. High-level topics such as "sports" or "entertainment" are rendered differently in each geographic region, revealing topic-specific regional distinctions. Applied to a new dataset of geotagged microblogs, our model recovers coherent topics and their regional variants, while identifying geographic areas of linguistic consistency. The model also enables prediction of an author's geographic location from raw text, outperforming both text regression and supervised topic models.
@inproceedings{Eisenstein:2010:LVM:1870658.1870782,
abstract = {The rapid growth of geotagged social media raises new computational possibilities for investigating geographic linguistic variation. In this paper, we present a multi-level generative model that reasons jointly about latent topics and geographical regions. High-level topics such as "sports" or "entertainment" are rendered differently in each geographic region, revealing topic-specific regional distinctions. Applied to a new dataset of geotagged microblogs, our model recovers coherent topics and their regional variants, while identifying geographic areas of linguistic consistency. The model also enables prediction of an author's geographic location from raw text, outperforming both text regression and supervised topic models.},
acmid = {1870782},
added-at = {2015-05-12T14:26:51.000+0200},
address = {Stroudsburg, PA, USA},
author = {Eisenstein, Jacob and O'Connor, Brendan and Smith, Noah A. and Xing, Eric P.},
biburl = {https://www.bibsonomy.org/bibtex/2baf2a8785a5ea1b8713fbb0b2bcc104b/asmelash},
booktitle = {Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing},
description = {A latent variable model for geographic lexical variation},
interhash = {5fa617d9c03183fedefb2bf2ce3cf6ba},
intrahash = {baf2a8785a5ea1b8713fbb0b2bcc104b},
keywords = {english language language-twitter lexical twitter},
location = {Cambridge, Massachusetts},
numpages = {11},
pages = {1277--1287},
publisher = {Association for Computational Linguistics},
series = {EMNLP '10},
timestamp = {2015-05-12T14:26:51.000+0200},
title = {A Latent Variable Model for Geographic Lexical Variation},
url = {http://dl.acm.org/citation.cfm?id=1870658.1870782},
year = 2010
}