In this paper we show that citation counts and Mendeley readership are poor indicators of research excellence. Our experimental design builds on the assumption that a good evaluation metric should be able to distinguish publications that have changed a research field from those that have not. The experiment has been conducted on a new dataset for bibliometric research which we call TrueImpactDataset. TrueImpactDataset is a collection of research publications of two types -- research papers which are considered seminal work in their area and papers which provide a survey (a literature review) of a research area. The dataset also contains related metadata, which include DOIs, titles, authors and abstracts. We describe how the dataset was built and provide overview statistics of the dataset. We propose to use the dataset for validating research evaluation metrics. By using this data, we show that widely used research metrics only poorly distinguish excellent research.