This collection consists of ~20M web queries collected from ~650k users over three months.
The data is sorted by anonymous user ID and sequentially arranged.
Here at Google Research we have been using word n-gram models for a variety of R&D projects, such as statistical machine translation, speech recognition, spelling correction, entity detection, information extraction, and others. While such models have usu
Expect to see an emphasis on the scholarly and research implications of the acquisition. I’m no Ph.D., but it boggles my mind to think what we might be able to learn about ourselves and the world around us from this wealth of data. And I’m certain we’ll learn things that none of us now can even possibly conceive.
S-Match is an open source Java framework for semantic matching. It contains semantic matching, minimal semantic matching and structure preserving semantic matching algorithm implementations.
Scientext is a new, on-line French and English corpus of scientific texts. The corpus includes 4.8 million running tokens in French, 13 million words of research articles in English (medicine and biology), and an English-language sub-corpus of French undergraduate students’ texts (1,1 million words). The corpus is organized to facilitate the linguistic study of authorial position and reasoning in scientific articles through phraseology and lexico-grammatical markers linked to causality.
D. Schmidt, A. Zehe, J. Lorenzen, L. Sergel, S. Düker, M. Krug, und F. Puppe. Proceedings of the 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, Seite 49--56. Punta Cana, Dominican Republic (online), Association for Computational Linguistics, (November 2021)
D. Schmidt, A. Zehe, J. Lorenzen, L. Sergel, S. Düker, M. Krug, und F. Puppe. Proceedings of the 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, Seite 49--56. Punta Cana, Dominican Republic (online), Association for Computational Linguistics, (November 2021)
H. Zhang, A. Santos, und J. Freire. Proceedings of the 30th ACM International Conference on Information &$\mathsemicolon$ Knowledge Management, ACM, (Oktober 2021)