bookmarks  1


    The files in this dataset are used to analyse the tweeting behaviour of computer scientists on Twitter. They comprise a set of 989,529 tweet-URL pairs (tweets_2014_researcher.tsv.bz2) from 2014 from 6,271 users of the computer scientists sample in specified by time, tweet id, user id, and URL, a set of 300,053,850 tweet ids (tweets_2014_sample.tsv.bz2) from the 1% Twitter stream sample from 2014, a set of 605,080 tweet-URL pairs (tweets_2014_sample_6694_users.tsv.bz2) from the 1% Twitter stream sample from 2014 for 6,694 users specified by time, tweet id, user id, and URL, a set of the top 10,000 host names (MAG_hosts_10000.tsv) from the Microsoft Academic Graph data (, specified by rank, URL count, and host name, and a set of 340 host names of URL shortening services (url_shortening_services.tsv). In addition, the following rankings (based on the odds ratio) of domains, hosts, and URLs that appear in both the researcher dataset and the sample are included: domains_by_odds_ratio.tsv.bz2 - a ranking of 61,860 domains, hosts_by_odds_ratio.tsv.bz2 - a ranking of 80,384 hosts, publisher_domains_by_odds_ratio.tsv.bz2 - a ranking of 924 publisher domains, publisher_urls_by_odds_ratio.tsv.bz2 - a ranking of 4,227 publisher URLs.
    7 years ago by @jaeschke
  • ⟨⟨
  • 1
  • ⟩⟩


    No matching posts.
  • ⟨⟨
  • ⟩⟩