This collection consists of ~20M web queries collected from ~650k users over three months.
The data is sorted by anonymous user ID and sequentially arranged.
Here at Google Research we have been using word n-gram models for a variety of R&D projects, such as statistical machine translation, speech recognition, spelling correction, entity detection, information extraction, and others. While such models have usu
X. Wang, Z. Wang, X. Han, W. Jiang, R. Han, Z. Liu, J. Li, P. Li, Y. Lin, and J. Zhou. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), page 1652--1671. Online, Association for Computational Linguistics, (November 2020)
O. Kashefi, and R. Hwa. Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020), page 200--208. Online, Association for Computational Linguistics, (November 2020)
R. Bommasani, and C. Cardie. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), page 8075--8096. Online, Association for Computational Linguistics, (November 2020)
T. McCoy, E. Pavlick, and T. Linzen. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, page 3428--3448. Florence, Italy, Association for Computational Linguistics, (July 2019)
S. Wunderlich, M. Ring, D. Landes, and A. Hotho. International Joint Conference: 12th International Conference on Computational Intelligence in Security for Information Systems (CISIS 2019) and 10th International Conference on EUropean Transnational Education (ICEUTE 2019) - Seville, Spain, May 13-15, 2019, Proceedings, volume 951 of Advances in Intelligent Systems and Computing, page 14--24. Springer, (2019)