This collection consists of ~20M web queries collected from ~650k users over three months.
The data is sorted by anonymous user ID and sequentially arranged.
Here at Google Research we have been using word n-gram models for a variety of R&D projects, such as statistical machine translation, speech recognition, spelling correction, entity detection, information extraction, and others. While such models have usu
T. McCoy, E. Pavlick, и T. Linzen. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, стр. 3428--3448. Florence, Italy, Association for Computational Linguistics, (июля 2019)
S. Wunderlich, M. Ring, D. Landes, и A. Hotho. International Joint Conference: 12th International Conference on Computational Intelligence in Security for Information Systems (CISIS 2019) and 10th International Conference on EUropean Transnational Education (ICEUTE 2019) - Seville, Spain, May 13-15, 2019, Proceedings, том 951 из Advances in Intelligent Systems and Computing, стр. 14--24. Springer, (2019)
K. Jiang, D. Wu, и H. Jiang. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), стр. 318--323. (2019)
N. Dehouche, и A. Wongkitrungrueng. Proceedings of ANZMAC 2018: The 20th Conference of the Australian and New Zealand Marketing Academy. Adelaide (Australia), стр. 3--5 December. (2018)
Z. Yang, P. Qi, S. Zhang, Y. Bengio, W. Cohen, R. Salakhutdinov, и C. Manning. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, стр. 2369--2380. Brussels, Belgium, Association for Computational Linguistics, (2018)
G. Cohen, S. Afshar, J. Tapson, и A. van Schaik. (2017)cite arxiv:1702.05373Comment: The dataset is now available for download from https://www.westernsydney.edu.au/bens/home/reproducible_research/emnist. This link is also included in the revised article.