You Are Where You Tweet: A Content-based Approach to Geo-locating Twitter Users
Z. Cheng, J. Caverlee, and K. Lee. Proceedings of the 19th ACM International Conference on Information and Knowledge Management, page 759--768. New York, NY, USA, ACM, (2010)
DOI: 10.1145/1871437.1871535
Abstract
We propose and evaluate a probabilistic framework for estimating a Twitter user's city-level location based purely on the content of the user's tweets, even in the absence of any other geospatial cues. By augmenting the massive human-powered sensing capabilities of Twitter and related microblogging services with content-derived location information, this framework can overcome the sparsity of geo-enabled features in these services and enable new location-based personalized information services, the targeting of regional advertisements, and so on. Three of the key features of the proposed approach are: (i) its reliance purely on tweet content, meaning no need for user IP information, private login information, or external knowledge bases; (ii) a classification component for automatically identifying words in tweets with a strong local geo-scope; and (iii) a lattice-based neighborhood smoothing model for refining a user's location estimate. The system estimates k possible locations for each user in descending order of confidence. On average we find that the location estimates converge quickly (needing just 100s of tweets), placing 51% of Twitter users within 100 miles of their actual location.
%0 Conference Paper
%1 Cheng:2010:YYT:1871437.1871535
%A Cheng, Zhiyuan
%A Caverlee, James
%A Lee, Kyumin
%B Proceedings of the 19th ACM International Conference on Information and Knowledge Management
%C New York, NY, USA
%D 2010
%I ACM
%K k3 language language-twitter location twitter
%P 759--768
%R 10.1145/1871437.1871535
%T You Are Where You Tweet: A Content-based Approach to Geo-locating Twitter Users
%U http://doi.acm.org/10.1145/1871437.1871535
%X We propose and evaluate a probabilistic framework for estimating a Twitter user's city-level location based purely on the content of the user's tweets, even in the absence of any other geospatial cues. By augmenting the massive human-powered sensing capabilities of Twitter and related microblogging services with content-derived location information, this framework can overcome the sparsity of geo-enabled features in these services and enable new location-based personalized information services, the targeting of regional advertisements, and so on. Three of the key features of the proposed approach are: (i) its reliance purely on tweet content, meaning no need for user IP information, private login information, or external knowledge bases; (ii) a classification component for automatically identifying words in tweets with a strong local geo-scope; and (iii) a lattice-based neighborhood smoothing model for refining a user's location estimate. The system estimates k possible locations for each user in descending order of confidence. On average we find that the location estimates converge quickly (needing just 100s of tweets), placing 51% of Twitter users within 100 miles of their actual location.
%@ 978-1-4503-0099-5
@inproceedings{Cheng:2010:YYT:1871437.1871535,
abstract = {We propose and evaluate a probabilistic framework for estimating a Twitter user's city-level location based purely on the content of the user's tweets, even in the absence of any other geospatial cues. By augmenting the massive human-powered sensing capabilities of Twitter and related microblogging services with content-derived location information, this framework can overcome the sparsity of geo-enabled features in these services and enable new location-based personalized information services, the targeting of regional advertisements, and so on. Three of the key features of the proposed approach are: (i) its reliance purely on tweet content, meaning no need for user IP information, private login information, or external knowledge bases; (ii) a classification component for automatically identifying words in tweets with a strong local geo-scope; and (iii) a lattice-based neighborhood smoothing model for refining a user's location estimate. The system estimates k possible locations for each user in descending order of confidence. On average we find that the location estimates converge quickly (needing just 100s of tweets), placing 51% of Twitter users within 100 miles of their actual location.},
acmid = {1871535},
added-at = {2015-03-14T21:38:37.000+0100},
address = {New York, NY, USA},
author = {Cheng, Zhiyuan and Caverlee, James and Lee, Kyumin},
biburl = {https://www.bibsonomy.org/bibtex/2fd6eed16cf052176704383aa7c03dbe1/asmelash},
booktitle = {Proceedings of the 19th ACM International Conference on Information and Knowledge Management},
description = {You are where you tweet},
doi = {10.1145/1871437.1871535},
interhash = {b993f878cb2f045ee7bb080c7d298720},
intrahash = {fd6eed16cf052176704383aa7c03dbe1},
isbn = {978-1-4503-0099-5},
keywords = {k3 language language-twitter location twitter},
location = {Toronto, ON, Canada},
numpages = {10},
pages = {759--768},
publisher = {ACM},
series = {CIKM '10},
timestamp = {2016-02-08T18:00:14.000+0100},
title = {You Are Where You Tweet: A Content-based Approach to Geo-locating Twitter Users},
url = {http://doi.acm.org/10.1145/1871437.1871535},
year = 2010
}