@stroeh

MapReduce for information retrieval evaluation: "Let's quickly test this on 12 TB of data"

, and . Multilingual and Multimodal Information Access Evaluation, volume 6360 of Lecture Notes in Computer Science, page 64--69. Berlin, Springer Verlag, (2010)

Abstract

We propose to use MapReduce to quickly test new retrieval approaches on a cluster of machines by sequentially scanning all documents. We present a small case study in which we use a cluster of 15 low cost machines to search a web crawl of 0.5 billion pages showing that sequential scanning is a viable approach to running large-scale information retrieval experiments with little effort. The code is available to other researchers at: http://mirex.sourceforge.net.

Links and resources

Tags

community

  • @stroeh
  • @dblp
  • @promisenoe
@stroeh's tags highlighted