Following up on KMeans Clustering Now Running on Elastic MapReduce, Stephen Green has generously documented the steps that was necessary to get an example of k-Means clustering up and running on Amazon’s Elastic MapReduce (EMR) on the Apache Lucene Mahout wiki.
S. Basu, A. Banerjee, and R. Mooney. Proceedings of the 2004 SIAM International Conference on Data Mining, page 333--344. Lake Buena Vista, FL, Society for Industrial and Applied Mathematics, (April 2004)
A. Phansalkar, A. Joshi, L. Eeckhout, and L. John. IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005., page 10--20. (March 2005)
A. Hotho, A. Maedche, and S. Staab. ICDM '01: Proceedings of the 2001 IEEE International Conference on Data Mining, page 607--608. Washington, DC, USA, IEEE Computer Society, (2001)
A. Hotho, A. Maedche, and S. Staab. ICDM '01: Proceedings of the 2001 IEEE International Conference on Data Mining, page 607--608. Washington, DC, USA, IEEE Computer Society, (2001)
D. Cutting, D. Karger, J. Pedersen, and J. Tukey. SIGIR '92: Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval, page 318--329. New York, NY, USA, ACM Press, (1992)
G. Hamerly, and C. Elkan. CIKM '02: Proceedings of the eleventh international conference on Information and knowledge management, page 600--607. New York, NY, USA, ACM, (2002)
D. Arthur, and S. Vassilvitskii. SODA '07: Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, page 1027--1035. Philadelphia, PA, USA, Society for Industrial and Applied Mathematics, (2007)