Abstract
This paper describes a state-of-the-art parallel Lattice QCD Monte Carlo code
for staggered fermions, purposely designed to be portable across different
computer architectures, including GPUs and commodity CPUs. Portability is
achieved using the OpenACC parallel programming model, used to develop a code
that can be compiled for several processor architectures. The paper focuses on
parallelization on multiple computing nodes using OpenACC to manage parallelism
within the node, and OpenMPI to manage parallelism among the nodes. We first
discuss the available strategies to be adopted to maximize performances, we
then describe selected relevant details of the code, and finally measure the
level of performance and scaling-performance that we are able to achieve. The
work focuses mainly on GPUs, which offer a significantly high level of
performances for this application, but also compares with results measured on
other processors.
Users
Please
log in to take part in the discussion (add own reviews or comments).