This title appears in the Scientific Report :
2014
Please use the identifier:
http://hdl.handle.net/2128/8332 in citations.
Performance Analysis and Enabling of the RayBen Code for the Intel® MIC Architecture
Performance Analysis and Enabling of the RayBen Code for the Intel® MIC Architecture
The subject of this project is the analysis and enabling of the RayBen code, which implements a finite difference scheme for the simulation of turbulent Rayleigh-Bénard convection in a closed cylindrical cell, for the Intel® Xeon Phi coprocessor architecture. After a brief introduction to the physic...
Saved in:
Personal Name(s): | Schnurpfeil, Alexander (Corresponding Author) |
---|---|
Janetzko, Florian / Janetzko, Stefanie / Thust, Kay / Emran, M. S. / Schumacher, J. | |
Contributing Institute: |
Jülich Supercomputing Center; JSC |
Published in: | 2014 |
Imprint: |
PRACE Consortium Partners
2014
|
Physical Description: |
9 p. |
Document Type: |
Report |
Research Program: |
PRACE - First Implementation Phase Project Supercomputer Facility |
Link: |
Get full text OpenAccess |
Publikationsportal JuSER |
The subject of this project is the analysis and enabling of the RayBen code, which implements a finite difference scheme for the simulation of turbulent Rayleigh-Bénard convection in a closed cylindrical cell, for the Intel® Xeon Phi coprocessor architecture. After a brief introduction to the physical background of the code, the integration of Rayben into the benchmarking environment JuBE is discussed. The structure of the code is analysed through its call graph. The most performance-critical routines were identified. A detailed analysis of the OpenMP parallelization revealed several race conditions which were eliminated. The code was ported to the JUROPA cluster at the Jülich Supercomputing as well as to the EURORA cluster at CINECA. The performance of the code is discussed using the results of pure MPI and hybrid MPI/OpenMP benchmarks. It is shown that RayBen is a memory-intensive application that highly benefits from the MPI parallelization. The offloading mechanism for the Intel® MIC architecture lowers considerably the performance while the use of binaries that run exclusively on the coprocessor show a satisfactory performance and a scalability which is comparable to the CPU. |