copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Is the Sample Good Enough? Comparing Data from Twitter's Streaming API with Twitter's Firehose

F. Morstatter, J. Pfeffer, H. Liu, and K. Carley. International AAAI Conference on Web and Social Media, (2013)

Abstract

Twitter is a social media giant famous for the exchange of short, 140-character messages called "tweets". In the scientific community, the microblogging site is known for openness in sharing its data. It provides a glance into its millions of users and billions of tweets through a "Streaming API" which provides a sample of all tweets matching some parameters preset by the API user. The API service has been used by many researchers, companies, and governmental institutions that want to extract knowledge in accordance with a diverse array of questions pertaining to social media. The essential drawback of the Twitter API is the lack of documentation concerning what and how much data users get. This leads researchers to question whether the sampled data is a valid representation of the overall activity on Twitter. In this work we embark on answering this question by comparing data collected using Twitter's sampled API service with data collected using the full, albeit costly, Firehose stream that includes every single published tweet. We compare both datasets using common statistical metrics as well as metrics that allow us to compare topics, networks, and locations of tweets. The results of our work will help researchers and practitioners understand the implications of using the Streaming API.

Links and resources

BibTeX key: ICWSM136071
entry type: inproceedings
booktitle: International AAAI Conference on Web and Social Media
year: 2013
url: http://www.aaai.org/ocs/index.php/ICWSM/ICWSM13/paper/view/6071

@tomhanika's tags highlighted

twittdiff

Cite this publication

@inproceedings{ICWSM136071, abstract = {Twitter is a social media giant famous for the exchange of short, 140-character messages called "tweets". In the scientific community, the microblogging site is known for openness in sharing its data. It provides a glance into its millions of users and billions of tweets through a "Streaming API" which provides a sample of all tweets matching some parameters preset by the API user. The API service has been used by many researchers, companies, and governmental institutions that want to extract knowledge in accordance with a diverse array of questions pertaining to social media. The essential drawback of the Twitter API is the lack of documentation concerning what and how much data users get. This leads researchers to question whether the sampled data is a valid representation of the overall activity on Twitter. In this work we embark on answering this question by comparing data collected using Twitter's sampled API service with data collected using the full, albeit costly, Firehose stream that includes every single published tweet. We compare both datasets using common statistical metrics as well as metrics that allow us to compare topics, networks, and locations of tweets. The results of our work will help researchers and practitioners understand the implications of using the Streaming API.}, added-at = {2015-12-09T17:43:28.000+0100}, author = {Morstatter, Fred and Pfeffer, Jürgen and Liu, Huan and Carley, Kathleen}, biburl = {https://www.bibsonomy.org/bibtex/2ba8782d3c478b90495b16ff4092001c2/tomhanika}, booktitle = {International AAAI Conference on Web and Social Media}, interhash = {bca742d25a5f5fa43c8f106460449b5b}, intrahash = {ba8782d3c478b90495b16ff4092001c2}, keywords = {twittdiff}, timestamp = {2015-12-09T17:43:28.000+0100}, title = {Is the Sample Good Enough? Comparing Data from Twitter's Streaming API with Twitter's Firehose}, url = {http://www.aaai.org/ocs/index.php/ICWSM/ICWSM13/paper/view/6071}, year = 2013 }

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Is the Sample Good Enough? Comparing Data from Twitter's Streaming API with Twitter's Firehose

Abstract

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(1)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Is the Sample Good Enough? Comparing Data from Twitter's Streaming API with Twitter's Firehose

Abstract

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (1)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Is the Sample Good Enough? Comparing Data from Twitter's Streaming API with Twitter's Firehose

Comments and Reviews
(1)