This past friday I was teaching a workshop at Coalition for Queens to prepare alumni from their code school for interviews as software developers. One asked about resources for learning how to choose…
The Net Data Directory collects and shares information on different sources of data about the Internet. For more about the project, see our about page. To get started, use the search box below, or check out our quick start guide.
This dataset is released by Signal Media to facilitate conducting research on news articles. It can be used for submissions to the NewsIR'16 workshop, but it is intended to serve the community for research on news retrieval in general.
The articles of the dataset were originally collected by Moreover Technologies (one of Signal's content providers) from a variety of news sources for a period of 1 month (1-30 September 2015). It contains 1 million articles that are mainly English, but they also include non-English and multi-lingual articles. Sources of these articles include major ones, such as Reuters, in addition to local news sources and blogs.
Microsoft Research collaborates with computer scientists at academic and scientific institutions to promote advances in computing technologies and research.
Map-Reduce is on its way out. But we shouldn’t measure its importance in the number of bytes it crunches, but the fundamental shift in data processing architectures it helped popularise.
Auf GovData, dem Datenportal für Deutschland sind Daten aller Verwaltungsebenen zentral zugänglich. Bis 2014 soll das Portal testweise betrieben und dabei schrittweise ausgebaut und optimiert werden. Aktuelle Infos und Ähnliches finden Sie im Bereich "Neues", dem Blog von GovData.
In deutschen Behörden und Universitäten befinden sich gewaltige Datenmengen und große Wissensschätze. Nicht alle Parteien wollen die Unterlagen gleichermaßen der Öffentlichkeit zugänglich
Wer wissen will, wie weit die nächste Kita entfernt ist oder wo Baustellen den Weg versperren, muss sich oft durch unübersichtliche Webseiten kämpfen. Das Projekt "Code for Germany" will das
Five years ago, a team of researchers from Google announced a remarkable achievement in one of the world’s top scientific journals, Nature. Without needing the results of a single medical check-up, they were nevertheless able to track the spread of
A STATEMENT OF COMMITMENT BY STM PUBLISHERS TO A ROADMAP TO ENABLE TEXT AND DATA MINING (TDM) FOR NON COMMERCIAL SCIENTIFIC RESEARCH IN THE EUROPEAN UNION
The files below contain XML (and only XML) for all the articles in the PMC open access subset. These files were created for users who need PMC XML for data mining and processing purposes, but do not need PDFs, images, or supplementary data.
PlanetData aims to establish a sustainable European community of researchers that supports organizations in exposing their data in an effective and efficient way.
Wir sind im Jahr 2012 angekommen - deutsche Verkehrsunternehmen aber noch nicht. Weder die Unternehmen noch die Politik haben es verstanden, welche Innovationskraft tausende freiwillige Entwickler_innen haben, um völlig neue Verkehrsapps oder Mobilitätskonzepte zu entwickeln - für Menschen, die viel reisen oder täglich pendeln, denen wegen Rollstuhl oder Gehhilfe Barrieren in den Weg gelegt werden, oder einfach mehr erwarten, als nur eine langweilige Fahrplanauskunft. Deshalb nehmen wir das jetzt in die Hand und werden alle Fahrpläne veröffentlichen - als Start für neue Innovation ohne Erlaubnis.
Twitter wird sein frisch eingekauftes Echtzeit-DV-System Storm als Open Source veröffentlichen. Damit wird die Technik für die Parallelisierung von Datenbankabfragen für alle verfügbar.
Web search engines have changed our lives - enabling instant access to information about subjects that are both deeply important to us, as well as passing whims. The search engines that provide answers to our search queries also log those queries, in order to improve their algorithms. Academic research on search queries has shown that they can provide valuable information on diverse topics including word and phrase similarity, topical seasonality and may even have potential for sociology, as well as providing a barometer of the popularity of many subjects. At the same time, individuals are rightly concerned about what the consequences of accidental leaking or deliberate sharing of this information may mean for their privacy. In this talk I will cover the applications which have benefited from mining query logs, the risks that privacy can be breached by sharing query logs, and current algorithms for mining logs in a way to prevent privacy breaches.
Workshop Topics
Possible topics of the workshop include (but are not limited to):
* Social network analysis
* Bibliometrics
* Community discovery
* Personalization for search and for social interaction
* Recommender systems
* Web mining algorithms
* Applications of social network analysis
* Mining (Collaborative) Tagging Systems (blogs, wikis, etc.)
* Mining social data for multimedia information retrieval
* Opinion mining
This is a repository of databases, domain theories and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms.
H. Zhang, A. Santos, and J. Freire. Proceedings of the 30th ACM International Conference on Information &$\mathsemicolon$ Knowledge Management, ACM, (October 2021)
M. Paris, and R. Jäschke. Proceedings of the 14th International Conference on Knowledge Science, Engineering and Management, volume 12816 of Lecture Notes in Artificial Intelligence, page 1--14. Springer, (2021)