This title appears in the Scientific Report :
2018
Please use the identifier:
http://hdl.handle.net/2128/21382 in citations.
Three Dirac operators on two architectures with one piece of code and no hassle
Three Dirac operators on two architectures with one piece of code and no hassle
A simple minded approach to implement three discretizations of the Dirac operator (staggered, Wilson, Brillouin) on two architectures (KNL and core i7) is presented. The idea is to use a high-level compiler along with OpenMP parallelization and SIMD pragmas, but to stay away from cache-line optimiza...
Saved in:
Personal Name(s): | Durr, Stephan (Corresponding author) |
---|---|
Contributing Institute: |
Jülich Supercomputing Center; JSC |
Published in: | S. 033 |
Imprint: |
Trieste
SISSA
2018
|
Physical Description: |
7 p. |
Conference: | 36th Annual International Symposium on Lattice Field Theory, Lattice 2018, East Lansing (USA), 2018-07-22 - 2018-07-28 |
Document Type: |
Contribution to a book Contribution to a conference proceedings |
Research Program: |
Computational Science and Mathematical Methods |
Series Title: |
Proceedings of Science
LATTICE2018 |
Link: |
OpenAccess OpenAccess |
Publikationsportal JuSER |
A simple minded approach to implement three discretizations of the Dirac operator (staggered, Wilson, Brillouin) on two architectures (KNL and core i7) is presented. The idea is to use a high-level compiler along with OpenMP parallelization and SIMD pragmas, but to stay away from cache-line optimization and/or assembly-tuning. The implementation is for N_v right-hand-sides, and this extra index is used to fill the SIMD pipeline. On one KNL node single precision performance figures for N_c=3, N_v=12 read 475 Gflop/s, 345 Gflop/s, and 790 Gflop/s for the three discretization schemes, respectively. |