The files in this dataset are used to analyse the tweeting behaviour of computer scientists on Twitter. They comprise a set of 989,529 tweet-URL pairs (tweets_2014_researcher.tsv.bz2) from 2014 from 6,271 users of the computer scientists sample in https://zenodo.org/record/12942 specified by time, tweet id, user id, and URL, a set of 300,053,850 tweet ids (tweets_2014_sample.tsv.bz2) from the 1% Twitter stream sample from 2014, a set of 605,080 tweet-URL pairs (tweets_2014_sample_6694_users.tsv.bz2) from the 1% Twitter stream sample from 2014 for 6,694 users specified by time, tweet id, user id, and URL, a set of the top 10,000 host names (MAG_hosts_10000.tsv) from the Microsoft Academic Graph data (http://blogs.msdn.com/b/msr_er/archive/2015/06/26/announcing-the-microsoft-academic-graph-let-the-research-begin.aspx), specified by rank, URL count, and host name, and a set of 340 host names of URL shortening services (url_shortening_services.tsv). In addition, the following rankings (based on the odds ratio) of domains, hosts, and URLs that appear in both the researcher dataset and the sample are included: domains_by_odds_ratio.tsv.bz2 - a ranking of 61,860 domains, hosts_by_odds_ratio.tsv.bz2 - a ranking of 80,384 hosts, publisher_domains_by_odds_ratio.tsv.bz2 - a ranking of 924 publisher domains, publisher_urls_by_odds_ratio.tsv.bz2 - a ranking of 4,227 publisher URLs.

  • @jaeschke

Comments and Reviews

This web page has not been reviewed yet.

rating distribution
average user rating0.0 out of 5.0 based on 0 reviews
    Please log in to take part in the discussion (add own reviews or comments).