Geotagging Named Entities in News and Online Documents
J. Rafiei, and D. Rafiei. Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, page 1321--1330. New York, NY, USA, ACM, (2016)
DOI: 10.1145/2983323.2983795
Abstract
News sources generate constant streams of text with many references to real world entities; understanding the content from such sources often requires effectively detecting the geographic foci of the entities. We study the problem of associating geography to named entities in online documents. More specifically, given a named entity and a page (or a set of pages) where the entity is mentioned, the problem being studied is how the geographic focus of the name can be resolved at a location granularity (e.g. city or country), assuming that the name has a geographic focus. We further study dispersion, and show that the dispersion of a name can be estimated with a good accuracy, allowing a geo-centre to be detected at an exact dispersion level. Two key features of our approach are: (i) minimal assumption is made on the structure of the mentions hence the approach can be applied to a diverse and heterogeneous set of web pages, and (ii) the approach is unsupervised, leveraging shallow English linguistic features and the large volume of location data in public domain. We evaluate our methods under different task settings and with different categories of named entities. Our evaluation reveals that the geo-centre of a name can be estimated with a good accuracy based on some simple statistics of the mentions, and that the accuracy of the estimation varies with the categories of the names.
Description
Geotagging Named Entities in News and Online Documents
%0 Conference Paper
%1 Rafiei:2016:GNE:2983323.2983795
%A Rafiei, Jiangwei Yu
%A Rafiei, Davood
%B Proceedings of the 25th ACM International on Conference on Information and Knowledge Management
%C New York, NY, USA
%D 2016
%I ACM
%K entities geotagging namend
%P 1321--1330
%R 10.1145/2983323.2983795
%T Geotagging Named Entities in News and Online Documents
%U http://doi.acm.org/10.1145/2983323.2983795
%X News sources generate constant streams of text with many references to real world entities; understanding the content from such sources often requires effectively detecting the geographic foci of the entities. We study the problem of associating geography to named entities in online documents. More specifically, given a named entity and a page (or a set of pages) where the entity is mentioned, the problem being studied is how the geographic focus of the name can be resolved at a location granularity (e.g. city or country), assuming that the name has a geographic focus. We further study dispersion, and show that the dispersion of a name can be estimated with a good accuracy, allowing a geo-centre to be detected at an exact dispersion level. Two key features of our approach are: (i) minimal assumption is made on the structure of the mentions hence the approach can be applied to a diverse and heterogeneous set of web pages, and (ii) the approach is unsupervised, leveraging shallow English linguistic features and the large volume of location data in public domain. We evaluate our methods under different task settings and with different categories of named entities. Our evaluation reveals that the geo-centre of a name can be estimated with a good accuracy based on some simple statistics of the mentions, and that the accuracy of the estimation varies with the categories of the names.
%@ 978-1-4503-4073-1
@inproceedings{Rafiei:2016:GNE:2983323.2983795,
abstract = {News sources generate constant streams of text with many references to real world entities; understanding the content from such sources often requires effectively detecting the geographic foci of the entities. We study the problem of associating geography to named entities in online documents. More specifically, given a named entity and a page (or a set of pages) where the entity is mentioned, the problem being studied is how the geographic focus of the name can be resolved at a location granularity (e.g. city or country), assuming that the name has a geographic focus. We further study dispersion, and show that the dispersion of a name can be estimated with a good accuracy, allowing a geo-centre to be detected at an exact dispersion level. Two key features of our approach are: (i) minimal assumption is made on the structure of the mentions hence the approach can be applied to a diverse and heterogeneous set of web pages, and (ii) the approach is unsupervised, leveraging shallow English linguistic features and the large volume of location data in public domain. We evaluate our methods under different task settings and with different categories of named entities. Our evaluation reveals that the geo-centre of a name can be estimated with a good accuracy based on some simple statistics of the mentions, and that the accuracy of the estimation varies with the categories of the names.},
acmid = {2983795},
added-at = {2016-12-07T15:17:03.000+0100},
address = {New York, NY, USA},
author = {Rafiei, Jiangwei Yu and Rafiei, Davood},
biburl = {https://www.bibsonomy.org/bibtex/27d6b0dde05259c7d0cd81d8f87f826ef/ntempelmeier},
booktitle = {Proceedings of the 25th ACM International on Conference on Information and Knowledge Management},
description = {Geotagging Named Entities in News and Online Documents},
doi = {10.1145/2983323.2983795},
interhash = {a4aeafd84fd95ff75e0870506edf5dc5},
intrahash = {7d6b0dde05259c7d0cd81d8f87f826ef},
isbn = {978-1-4503-4073-1},
keywords = {entities geotagging namend},
location = {Indianapolis, Indiana, USA},
numpages = {10},
pages = {1321--1330},
publisher = {ACM},
series = {CIKM '16},
timestamp = {2016-12-07T15:17:03.000+0100},
title = {Geotagging Named Entities in News and Online Documents},
url = {http://doi.acm.org/10.1145/2983323.2983795},
year = 2016
}