Abstract
Recent advances in neuroscientific understanding have
highlighted the highly parallel computation power of the
mammalian neocortex. In this paper we describe a
GPGPU-accelerated implementation of an intelligent learning
model inspired by the structural and functional properties of
the neocortex. Furthermore, we consider two inefficiencies
inherent to our initial implementation and propose software
optimizations to mitigate such problems. Analysis of our
application’s behavior and performance
provides important insights into the GPGPU architecture,
including the number of cores, the memory system, atomic
operations, and the global thread scheduler. Additionally, we
create a runtime profiling tool for the cortical network that
proportionally distributes work across the host CPU as well
as multiple GPGPUs available to the system. Using the
profiling tool with these optimizations on
Nvidia’s CUDA framework, we achieve up to
60×~speedup over a single-threaded CPU
implementation of the model.
Users
Please
log in to take part in the discussion (add own reviews or comments).