Abstract
GDELT—Global Data on Events, Location and Tone—is a new CAMEO-coded data
set containing more than 200-million geolocated events with global coverage for 1979 to
the present. The data are based on news reports from a variety of international news
sources coded using the Tabari system for events and additional software for location
and tone. The data is freely available and we expect to provide daily updates. This
paper describes the news sources and some of their characteristics, the various pro-
cessing steps that are used in generating the data, some comparisons with the KEDS
Levants/Reuters and ICEWS/Asia data sets, and some visualizations. We conclude
with an outline of planned enhancements to the data in the near future: these include
recoding with new WordNet-enhanced dictionaries, the extension of the CAMEO cod-
ing to incorporate codes for financial events, disease outbreaks and natural disasters,
and the development of an open-source Python-based successor to Tabari which will
use parsed input from existing natural language processing tools.
Users
Please
log in to take part in the discussion (add own reviews or comments).