Online repository of large data sets for researchers in knowledge discovery and data mining. includes Discrete Sequence Data, Image Data, Multivariate Data, Relational Data, Spatio-Temporal Data, Text (corpora), Time Series, Web Data (web pages and log files).
the data here is useful for testing classification / clustering, and the accuracy of indexing techniques. However the datasets are too small to make claims about the efficiency of indexing.
GroupLens is a research lab in the Department of Computer Science and Engineering at the University of Minnesota. datasets include MovieLens, Wikilens, Book-Crossing, Jester Joke, EachMovie.