Description: Multi-threaded Construction of Neighbour Lists for Particle Systems in OpenMP

This title appears in the Scientific Report : 2016

Multi-threaded Construction of Neighbour Lists for Particle Systems in OpenMP

The construction of neighbour lists based on the linked cell method is investigated in the context of particle simulation methods within the OpenMP shared memory programming model. Various implementations are studied which avoid memory collisions and race conditions. Performance and optimisation con...

Personal Name(s):	Halver, Rene (Corresponding author)
	Sutmann, Godehard
Contributing Institute:	Jülich Supercomputing Center; JSC
Published in:	Parallel Processing and Applied Mathematics / Wyrzykowski, Roman (Editor), ISBN: 978-3-319-32151-6=978-3-319-32152-3
Imprint:	Cham Springer International Publishing 2016
Physical Description:	153 - 165
ISBN:	978-3-319-32151-6 (print) 978-3-319-32152-3 (electronic)
DOI:	10.1007/978-3-319-32152-3_15
Conference:	11th International Conference on Parallel Processing and Applied Mathematics, Krakow (Poland), 2015-09-06 - 2015-09-09
Document Type:	Contribution to a conference proceedings
Research Program:	Computational Science and Mathematical Methods
Series Title:	Lecture Notes in Computer Science 9574
	Publikationsportal JuSER

Please use the identifier: http://dx.doi.org/10.1007/978-3-319-32152-3_15 in citations.

The construction of neighbour lists based on the linked cell method is investigated in the context of particle simulation methods within the OpenMP shared memory programming model. Various implementations are studied which avoid memory collisions and race conditions. Performance and optimisation considerations are made along with run time behaviour and memory requirements. Performance models are proposed, which reproduce the measured runtime behaviour and which provide insight into the performance dependence on specific system parameters. Benchmarks are performed for different implementations on a number of multi-core architectures and thread numbers up to 240 are considered on the Xeon Phi architecture in the SMT mode, so that performance can be studied for a large number of threads working concurrently on the construction of linked cells on a shared memory partition.