Description: Portable multi-node LQCD Monte Carlo simulations using OpenACC

This title appears in the Scientific Report : 2018

Portable multi-node LQCD Monte Carlo simulations using OpenACC

This paper describes a state-of-the-art parallel Lattice QCD Monte Carlo code for staggered fermions, purposely designed to be portable across different computer architectures, including GPUs and commodity CPUs. Portability is achieved using the OpenACC parallel programming model, used to develop a...

Personal Name(s):	Bonati, Claudio
	Calore, Enrico / D’Elia, Massimo / Mesiti, Michele / Negro, Francesco / Sanfilippo, Francesco / Schifano, Sebastiano Fabio / Silvi, Giorgio (Corresponding author) / Tripiccione, Raffaele
Contributing Institute:	Jülich Supercomputing Center; JSC
Published in:	International journal of modern physics / C, 29 (2018) 01, S. 1850010 -
Imprint:	Singapore [u.a.] World Scientific 2018
DOI:	10.1142/S0129183118500109
Document Type:	Journal Article
Research Program:	Doktorand ohne besondere Förderung Computational Science and Mathematical Methods
Link:	OpenAccess OpenAccess
	Publikationsportal JuSER

Please use the identifier: http://dx.doi.org/10.1142/S0129183118500109 in citations.
Please use the identifier: http://hdl.handle.net/2128/18722 in citations.

This paper describes a state-of-the-art parallel Lattice QCD Monte Carlo code for staggered fermions, purposely designed to be portable across different computer architectures, including GPUs and commodity CPUs. Portability is achieved using the OpenACC parallel programming model, used to develop a code that can be compiled for several processor architectures. The paper focuses on parallelization on multiple computing nodes using OpenACC to manage parallelism within the node, and OpenMPI to manage parallelism among the nodes. We first discuss the available strategies to be adopted to maximize performances, we then describe selected relevant details of the code, and finally measure the level of performance and scaling-performance that we are able to achieve. The work focuses mainly on GPUs, which offer a significantly high level of performances for this application, but also compares with results measured on other processors.