Building and operating large-scale information retrieval systems used by hundreds of millions of people around the world provides a number of interesting challenges. Designing such systems requires making complex design tradeoffs in a number of dimensions, including (a) the number of user queries that must be handled per second and the response latency to these requests, (b) the number and size of various corpora that are searched, (c) the latency and frequency with which documents are updated or added to the corpora, and (d) the quality and cost of the ranking algorithms that are used for retrieval. In this talk I'll discuss the evolution of Google's hardware infrastructure and information retrieval systems and some of the design challenges that arise from ever-increasing demands in all of these dimensions. I'll also describe how we use various pieces of distributed systems infrastructure when building these retrieval systems. Finally, I'll describe some future challenges and open research problems in this area.
T. Kenter, M. Wevers, P. Huijnen, and M. de Rijke. Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, page 1191--1200. New York, NY, USA, ACM, (2015)
Y. Ibrahim, M. Amir Yosef, and G. Weikum. Proceedings of the 7th International Workshop on Exploiting Semantic Annotations in Information Retrieval, page 17--19. New York, NY, USA, ACM, (2014)
A. Argaw, and L. Asker. Proceedings of the 2007 Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources, page 104--110. Stroudsburg, PA, USA, Association for Computational Linguistics, (2007)
I. Witten, and D. Milne. Proceeding of AAAI Workshop on Wikipedia and Artificial Intelligence: an Evolving Synergy, AAAI Press, Chicago, USA, page 25--30. (2008)
Y. Duan, L. Jiang, T. Qin, M. Zhou, and H. Shum. Proceedings of the 23rd International Conference on Computational Linguistics, page 295--303. Stroudsburg, PA, USA, Association for Computational Linguistics, (2010)
H. Sun, M. Srivatsa, S. Tan, Y. Li, L. Kaplan, S. Tao, and X. Yan. Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, page 1486--1495. New York, NY, USA, ACM, (2014)
T. Finin, W. Murnane, A. Karandikar, N. Keller, J. Martineau, and M. Dredze. Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk, page 80--88. Stroudsburg, PA, USA, Association for Computational Linguistics, (2010)