Earlier this week the UK Conservative party promised to offer a £1m cash prize to a person or team that creates an online platform that can be used to solve “common problems”. The prize – which the party says will
1st Part: "Prof. Bruns, Prof. Burguess & Dr. Woodford: Mapping Online Publics: New Methods for Twitter Research"
2nd Part: "Robert Jäschke: Identifying and Analyzing Researchers on Twitter"
Building and operating large-scale information retrieval systems used by hundreds of millions of people around the world provides a number of interesting challenges. Designing such systems requires making complex design tradeoffs in a number of dimensions, including (a) the number of user queries that must be handled per second and the response latency to these requests, (b) the number and size of various corpora that are searched, (c) the latency and frequency with which documents are updated or added to the corpora, and (d) the quality and cost of the ranking algorithms that are used for retrieval. In this talk I'll discuss the evolution of Google's hardware infrastructure and information retrieval systems and some of the design challenges that arise from ever-increasing demands in all of these dimensions. I'll also describe how we use various pieces of distributed systems infrastructure when building these retrieval systems. Finally, I'll describe some future challenges and open research problems in this area.
M. Schwab, R. Jäschke, F. Fischer, and J. Strötgen. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, page 6239--6244. Association for Computational Linguistics, (November 2019)
M. Schwab, R. Jäschke, and F. Fischer. Proceedings of the 5th International Conference on Natural Language and Speech Processing, page 282--287. Association for Computational Linguistics, (2022)