Description: Performance Analysis and Enabling of the RayBen Code for the Intel® MIC Architecture

This title appears in the Scientific Report : 2014

Performance Analysis and Enabling of the RayBen Code for the Intel® MIC Architecture

The subject of this project is the analysis and enabling of the RayBen code, which implements a finite difference scheme for the simulation of turbulent Rayleigh-Bénard convection in a closed cylindrical cell, for the Intel® Xeon Phi coprocessor architecture. After a brief introduction to the physic...

Personal Name(s):	Schnurpfeil, Alexander (Corresponding Author)
	Janetzko, Florian / Janetzko, Stefanie / Thust, Kay / Emran, M. S. / Schumacher, J.
Contributing Institute:	Jülich Supercomputing Center; JSC
Published in:	2014
Imprint:	PRACE Consortium Partners 2014
Physical Description:	9 p.
Document Type:	Report
Research Program:	PRACE - First Implementation Phase Project Supercomputer Facility
Link:	Get full text OpenAccess
	Publikationsportal JuSER

Please use the identifier: http://hdl.handle.net/2128/8332 in citations.

The subject of this project is the analysis and enabling of the RayBen code, which implements a finite difference scheme for the simulation of turbulent Rayleigh-Bénard convection in a closed cylindrical cell, for the Intel® Xeon Phi coprocessor architecture. After a brief introduction to the physical background of the code, the integration of Rayben into the benchmarking environment JuBE is discussed. The structure of the code is analysed through its call graph. The most performance-critical routines were identified. A detailed analysis of the OpenMP parallelization revealed several race conditions which were eliminated. The code was ported to the JUROPA cluster at the Jülich Supercomputing as well as to the EURORA cluster at CINECA. The performance of the code is discussed using the results of pure MPI and hybrid MPI/OpenMP benchmarks. It is shown that RayBen is a memory-intensive application that highly benefits from the MPI parallelization. The offloading mechanism for the Intel® MIC architecture lowers considerably the performance while the use of binaries that run exclusively on the coprocessor show a satisfactory performance and a scalability which is comparable to the CPU.