Spark is a fast, in-memory cluster computing framework with a language-integrated interface in Scala. It shines at iterative MapReduce (e.g. machine learning) and interactive data mining, where keeping data in memory provides substantial speedups.
Spark is an open source cluster computing system that aims to make data analytics fast — both fast to run and fast to write. To run programs faster, Spark provides primitives for in-memory cluster computing: your job can load data into memory and query it repeatedly much quicker than with disk-based systems like Hadoop.
Y. Hold-Geoffroy, O. Gagnon, and M. Parizeau. Proceedings of the 2014 Annual Conference on Extreme Science and Engineering Discovery Environment, page 60. ACM, (2014)
N. Vasilache, M. Baskaran, B. Meister, and R. Lethin. Proceedings of the 6th Workshop on General Purpose Processor
Using Graphics Processing Units, page 42--53. New York, NY, USA, ACM, (2013)
D. Cordes, A. Heinig, P. Marwedel, and A. Mallik. Parallel and Distributed Systems (ICPADS), 2011 IEEE 17th
International Conference on, page 699--706. (December 2011)
A. Welc, S. Jagannathan, and A. Hosking. Proceedings of the 20th Annual ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications, page 439--453. ACM, (2005)
O. Tripp, G. Yorsh, J. Field, and M. Sagiv. Proceedings of the 2011 ACM International Conference on Object Oriented Programming Systems Languages and Applications, page 207--224. ACM, (2011)
M. Scott, T. LeBlanc, and B. Marsh. PPOPP '90: Proceedings of the second ACM SIGPLAN symposium on Principles & practice of parallel programming, page 70--78. New York, NY, USA, ACM, (1990)