Following up on KMeans Clustering Now Running on Elastic MapReduce, Stephen Green has generously documented the steps that was necessary to get an example of k-Means clustering up and running on Amazon’s Elastic MapReduce (EMR) on the Apache Lucene Mahout wiki.
S. Basu, A. Banerjee, and R. Mooney. Proceedings of the 2004 SIAM International Conference on Data Mining, page 333--344. Lake Buena Vista, FL, Society for Industrial and Applied Mathematics, (April 2004)
D. Cutting, D. Karger, J. Pedersen, and J. Tukey. SIGIR '92: Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval, page 318--329. New York, NY, USA, ACM Press, (1992)
D. Arthur, and S. Vassilvitskii. SODA '07: Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, page 1027--1035. Philadelphia, PA, USA, Society for Industrial and Applied Mathematics, (2007)