Description: Hands-on Practical Hybrid Parallel Application Performance Engineering

This title appears in the Scientific Report : 2023

Hands-on Practical Hybrid Parallel Application Performance Engineering

This tutorial presents state-of-the-art performance tools for leading-edge HPC systems founded on the community-developed Score-P instrumentation and measurement infrastructure, demonstrating how they can be used for performance engineering of effective scientific applications based on standard MPI,...

Personal Name(s):	Geimer, Markus (Corresponding author)
	Shende, Sameer / Wesarg, Bert / Wylie, Brian J. N. (Corresponding author)
Contributing Institute:	Jülich Supercomputing Center; JSC
Imprint:	2023
DOI:	10.34734/FZJ-2023-04971
Conference:	The International Conference for High Performance Computing, Networking, Storage, and Analysis '23, Denver, CO (USA), 2023-11-13 - 2023-11-13
Document Type:	Lecture
Research Program:	DEEP – SOFTWARE FOR EXASCALE ARCHITECTURES Cross-Domain Algorithms, Tools, Methods Labs (ATMLs) and Research Groups
Link:	OpenAccess
	Publikationsportal JuSER

Please use the identifier: http://dx.doi.org/10.34734/FZJ-2023-04971 in citations.

This tutorial presents state-of-the-art performance tools for leading-edge HPC systems founded on the community-developed Score-P instrumentation and measurement infrastructure, demonstrating how they can be used for performance engineering of effective scientific applications based on standard MPI, OpenMP, hybrid MPI+OpenMP, and increasingly common usage of accelerators. Parallel performance tools from the Virtual Institute – High Productivity Supercomputing (VI-HPS) are introduced and featured in hands-on exercises with Score-P, Scalasca, Vampir, and TAU. We present the complete workflow of performance engineering, including instrumentation, measurement (profiling and tracing, timing and PAPI hardware counters), data storage, analysis, tuning, and visualization. Emphasis is placed on how tools are used in combination for identifying performance problems and investigating optimization alternatives. Using their own notebook computers participants will conduct exercises on quad-A100 GPU nodes of the JUWELS-Booster modular supercomputer via the Jupyter-JSC service. This will help to prepare participants to locate and diagnose performance bottlenecks in their own parallel programs.