This title appears in the Scientific Report :
2009
Please use the identifier:
http://dx.doi.org/10.1016/j.parco.2009.02.003 in citations.
A scalable tool architecture for diagnosing wait states in massively parallel applications
A scalable tool architecture for diagnosing wait states in massively parallel applications
When scaling message-passing applications to thousands of processors, their performance is often affected by wait states that occur when processes fail to reach synchronization points simultaneously. As a first step in reducing the performance impact, we have shown in our earlier work that wait stat...
Saved in:
Personal Name(s): | Geimer, M. |
---|---|
Wolf, F. / Wylie, B. / Mohr, B. | |
Contributing Institute: |
Jülich Supercomputing Center; JSC JARA - HPC; JARA-HPC |
Published in: | Parallel computing, 35 (2009) S. 375 - 388 |
Imprint: |
Amsterdam [u.a.]
North-Holland, Elsevier Science
2009
|
Physical Description: |
375 - 388 |
DOI: |
10.1016/j.parco.2009.02.003 |
Document Type: |
Journal Article |
Research Program: |
Scientific Computing |
Series Title: |
Parallel Computing
35 |
Subject (ZB): | |
Publikationsportal JuSER |
When scaling message-passing applications to thousands of processors, their performance is often affected by wait states that occur when processes fail to reach synchronization points simultaneously. As a first step in reducing the performance impact, we have shown in our earlier work that wait states can be diagnosed by searching event traces for characteristic patterns. However, our initial sequential search method did not scale beyond several hundred processes. Here, we present a scalable approach, based on a parallel replay of the target application's communication behavior, that can efficiently identify wait states at the previously inaccessible scale of 65,536 processes and that has potential for even larger configurations. We explain how our new approach has been integrated into a comprehensive parallel tool architecture, which we use to demonstrate that wait states may consume a major fraction of the execution time at larger scales. (C) 2009 Elsevier B.V. All rights reserved. |