Marvin is a deep learning framework designed first and foremost to be hackable. It is naively simple for fast prototyping, uses only basic C/C++, and only calls CUDA and cuDNN as dependencies.
This first post in a series on CUDA C and C++ covers the basic concepts of parallel programming on the CUDA platform with C/C++.
int i = blockDim.x * blockIdx.x + threadIdx.x
A. Dallmann, P. Beck, and J. von Gudenberg. Parallel Processing and Applied Mathematics, volume 8385 of Lecture Notes in Computer Science, Springer Berlin Heidelberg, (2014)
N. Vasilache, M. Baskaran, B. Meister, and R. Lethin. Proceedings of the 6th Workshop on General Purpose Processor
Using Graphics Processing Units, page 42--53. New York, NY, USA, ACM, (2013)