Researchers have capitalized on microblogging services, such as Twitter, for detecting and monitoring real world events. Existing approaches have based their conclusions on data collected by monitoring a set of pre-defined keywords. In this paper, we show that this manner of data collection risks losing a significant amount of relevant information. We then propose an adaptive crawling model that detects emerging popular hashtags, and monitors them to retrieve greater amounts of highly associated data for events of interest. The proposed model analyzes the traffic patterns of the hashtags collected from the live stream to update subsequent collection queries. To evaluate this adaptive crawling model, we apply it to a dataset collected during the 2012 London Olympic Games. Our analysis shows that adaptive crawling based on the proposed Refined Keyword Adaptation algorithm collects a more comprehensive dataset than pre-defined keyword crawling, while only introducing a minimum amount of noise.
Description
Exploiting hashtags for adaptive microblog crawling
%0 Conference Paper
%1 Wang:2013:EHA:2492517.2492624
%A Wang, Xinyue
%A Tokarchuk, Laurissa
%A Cuadrado, Félix
%A Poslad, Stefan
%B Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
%C New York, NY, USA
%D 2013
%I ACM
%K Hashtag adaptive crawling exploiting for in streams twitter
%P 311--315
%R 10.1145/2492517.2492624
%T Exploiting Hashtags for Adaptive Microblog Crawling
%U http://doi.acm.org/10.1145/2492517.2492624
%X Researchers have capitalized on microblogging services, such as Twitter, for detecting and monitoring real world events. Existing approaches have based their conclusions on data collected by monitoring a set of pre-defined keywords. In this paper, we show that this manner of data collection risks losing a significant amount of relevant information. We then propose an adaptive crawling model that detects emerging popular hashtags, and monitors them to retrieve greater amounts of highly associated data for events of interest. The proposed model analyzes the traffic patterns of the hashtags collected from the live stream to update subsequent collection queries. To evaluate this adaptive crawling model, we apply it to a dataset collected during the 2012 London Olympic Games. Our analysis shows that adaptive crawling based on the proposed Refined Keyword Adaptation algorithm collects a more comprehensive dataset than pre-defined keyword crawling, while only introducing a minimum amount of noise.
%@ 978-1-4503-2240-9
@inproceedings{Wang:2013:EHA:2492517.2492624,
abstract = {Researchers have capitalized on microblogging services, such as Twitter, for detecting and monitoring real world events. Existing approaches have based their conclusions on data collected by monitoring a set of pre-defined keywords. In this paper, we show that this manner of data collection risks losing a significant amount of relevant information. We then propose an adaptive crawling model that detects emerging popular hashtags, and monitors them to retrieve greater amounts of highly associated data for events of interest. The proposed model analyzes the traffic patterns of the hashtags collected from the live stream to update subsequent collection queries. To evaluate this adaptive crawling model, we apply it to a dataset collected during the 2012 London Olympic Games. Our analysis shows that adaptive crawling based on the proposed Refined Keyword Adaptation algorithm collects a more comprehensive dataset than pre-defined keyword crawling, while only introducing a minimum amount of noise.},
acmid = {2492624},
added-at = {2016-06-29T14:41:19.000+0200},
address = {New York, NY, USA},
author = {Wang, Xinyue and Tokarchuk, Laurissa and Cuadrado, F{\'e}lix and Poslad, Stefan},
biburl = {https://www.bibsonomy.org/bibtex/281c4f98d2f8f3b94bf1ea902c386b2b2/amitl3s},
booktitle = {Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining},
description = {Exploiting hashtags for adaptive microblog crawling},
doi = {10.1145/2492517.2492624},
interhash = {af5fc3bdeb427b6e0def7883553c64bf},
intrahash = {81c4f98d2f8f3b94bf1ea902c386b2b2},
isbn = {978-1-4503-2240-9},
keywords = {Hashtag adaptive crawling exploiting for in streams twitter},
location = {Niagara, Ontario, Canada},
numpages = {5},
pages = {311--315},
publisher = {ACM},
series = {ASONAM '13},
timestamp = {2016-06-29T14:41:19.000+0200},
title = {Exploiting Hashtags for Adaptive Microblog Crawling},
url = {http://doi.acm.org/10.1145/2492517.2492624},
year = 2013
}