Inproceedings,

TwitterEcho: A Distributed Focused Crawler to Support Open Research with Twitter Data

M. Bosnjak, E. Oliveira, J. Martins, E. Mendes Rodrigues, and L. Sarmento.
Proceedings of the 21st International Conference on World Wide Web, page 1233--1240. New York, NY, USA, ACM, (2012)
DOI: 10.1145/2187980.2188266

Abstract

Modern social network analysis relies on vast quantities of data to infer new knowledge about human relations and communication. In this paper we describe TwitterEcho, an open source Twitter crawler for supporting this kind of research, which is characterized by a modular distributed architecture. Our crawler enables researchers to continuously collect data from particular user communities, while respecting Twitter's imposed limits. We present the core modules of the crawling server, some of which were specifically designed to focus the crawl on the Portuguese Twittosphere. Additional modules can be easily implemented, thus changing the focus to a different community. Our evaluation of the system shows high crawling performance and coverage.

BibTeX key: Bosnjak:2012:TDF:2187980.2188266
entry type: inproceedings
address: New York, NY, USA
booktitle: Proceedings of the 21st International Conference on World Wide Web
year: 2012
pages: 1233--1240
publisher: ACM
series: WWW '12 Companion
acmid: 2188266
isbn: 978-1-4503-1230-1
location: Lyon, France
numpages: 8
DOI: 10.1145/2187980.2188266
url: http://doi.acm.org/10.1145/2187980.2188266

BibSonomy

TwitterEcho: A Distributed Focused Crawler to Support Open Research with Twitter Data

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on