This collection consists of ~20M web queries collected from ~650k users over three months.
The data is sorted by anonymous user ID and sequentially arranged.
Here at Google Research we have been using word n-gram models for a variety of R&D projects, such as statistical machine translation, speech recognition, spelling correction, entity detection, information extraction, and others. While such models have usu
G. Cohen, S. Afshar, J. Tapson, and A. van Schaik. (2017)cite arxiv:1702.05373Comment: The dataset is now available for download from https://www.westernsydney.edu.au/bens/home/reproducible_research/emnist. This link is also included in the revised article.
P. Wu, Y. Lee, H. Tseng, H. Ho, M. Yang, and S. Chien. 2017 IEEE International Symposium on Mixed and Augmented Reality (ISMAR-Adjunct), page 186-191. IEEE Computer Society, (2017)