This title appears in the Scientific Report :
2023
Please use the identifier:
http://dx.doi.org/10.34734/FZJ-2023-04971 in citations.
Hands-on Practical Hybrid Parallel Application Performance Engineering
Hands-on Practical Hybrid Parallel Application Performance Engineering
This tutorial presents state-of-the-art performance tools for leading-edge HPC systems founded on the community-developed Score-P instrumentation and measurement infrastructure, demonstrating how they can be used for performance engineering of effective scientific applications based on standard MPI,...
Saved in:
Personal Name(s): | Geimer, Markus (Corresponding author) |
---|---|
Shende, Sameer / Wesarg, Bert / Wylie, Brian J. N. (Corresponding author) | |
Contributing Institute: |
Jülich Supercomputing Center; JSC |
Imprint: |
2023
|
DOI: |
10.34734/FZJ-2023-04971 |
Conference: | The International Conference for High Performance Computing, Networking, Storage, and Analysis '23, Denver, CO (USA), 2023-11-13 - 2023-11-13 |
Document Type: |
Lecture |
Research Program: |
DEEP – SOFTWARE FOR EXASCALE ARCHITECTURES Cross-Domain Algorithms, Tools, Methods Labs (ATMLs) and Research Groups |
Link: |
OpenAccess |
Publikationsportal JuSER |
This tutorial presents state-of-the-art performance tools for leading-edge HPC systems founded on the community-developed Score-P instrumentation and measurement infrastructure, demonstrating how they can be used for performance engineering of effective scientific applications based on standard MPI, OpenMP, hybrid MPI+OpenMP, and increasingly common usage of accelerators. Parallel performance tools from the Virtual Institute – High Productivity Supercomputing (VI-HPS) are introduced and featured in hands-on exercises with Score-P, Scalasca, Vampir, and TAU. We present the complete workflow of performance engineering, including instrumentation, measurement (profiling and tracing, timing and PAPI hardware counters), data storage, analysis, tuning, and visualization. Emphasis is placed on how tools are used in combination for identifying performance problems and investigating optimization alternatives. Using their own notebook computers participants will conduct exercises on quad-A100 GPU nodes of the JUWELS-Booster modular supercomputer via the Jupyter-JSC service. This will help to prepare participants to locate and diagnose performance bottlenecks in their own parallel programs. |