Staff View: Timestamp Synchronization of Concurrent Events

This title appears in the Scientific Report : 2010

Timestamp Synchronization of Concurrent Events

Supercomputing is a key technological pillar of modern science and engineering, indispensable for solving critical problems of high complexity. However, to effectively utilize the enormously complex large-scale computer systems available today, scientists and engineers need powerful and robust softw...

Personal Name(s):	Becker, Daniel (Corresponding author)
Contributing Institute:	Jülich Supercomputing Center; JSC
Imprint:	Jülich Forschungszentrum Jülich GmbH Zentralbibliothek, Verlag 2010
Physical Description:	XVIII, 116 S.
Dissertation Note:	RWTH Aachen, Diss., 2010
ISBN:	978-3-89336-625-5
Document Type:	Book Dissertation / PhD Thesis
Research Program:	Computational Science and Mathematical Methods Scientific Computing
Series Title:	Schriften des Forschungszentrums Jülich : IAS Series 4
Subject (ZB):	Hochschulschrift > Dissertation (FH)
Link:	OpenAccess
	Publikationsportal JuSER

Please use the identifier: http://hdl.handle.net/2128/3787 in citations.


LEADER	06521nam a2200577 a 4500
001	10841
005	20210129210529.0
980			\|a VDB
980			\|a JUWEL
980			\|a ConvertedRecord
980			\|a phd
980			\|a I:(DE-Juel1)JSC-20090406
980			\|a UNRESTRICTED
980			\|a FullTexts
980	1		\|a FullTexts
490	0		\|0 PERI:(DE-600)2525100-4 \|a Schriften des Forschungszentrums Jülich : IAS Series \|v 4
037			\|a PreJuSER-10841
856	4		\|u http://juser.fz-juelich.de/record/10841/files/IAS%20Series_04.pdf \|y OpenAccess
856	4		\|u http://juser.fz-juelich.de/record/10841/files/IAS%20Series_04.jpg?subformat=icon-1440 \|x icon-1440 \|y OpenAccess
856	4		\|u http://juser.fz-juelich.de/record/10841/files/IAS%20Series_04.jpg?subformat=icon-180 \|x icon-180 \|y OpenAccess
856	4		\|u http://juser.fz-juelich.de/record/10841/files/IAS%20Series_04.jpg?subformat=icon-640 \|x icon-640 \|y OpenAccess
970			\|a VDB:(DE-Juel1)121432
655		7	\|a Hochschulschrift \|x Dissertation (FH)
520			\|a Supercomputing is a key technological pillar of modern science and engineering, indispensable for solving critical problems of high complexity. However, to effectively utilize the enormously complex large-scale computer systems available today, scientists and engineers need powerful and robust software development tools. One technique widely used by such tools is event tracing with a broad spectrum of applications ranging from performance analysis, performance prediction and modeling to debugging. In particular, event traces are helpful in understanding the performance behavior of parallel programs since they allow the in-depth analysis of communication and synchronization patterns. The accuracy of such analyses depends on the comparability of timestamps taken on different processors and may be adversely affected by non-synchronized clocks leading to inaccurate relative event timings. Such inaccuracies may cause a given interval to appear shorter or longer than it actually was, or introduce violations of the logical event order, which requires a message to be received only after it has been sent. Inconsistent trace data may not only lead to false conclusions, for instance, when the impact of communication patterns is quantified, but may also confuse the user of trace-visualization tools by causing message arrows to point backward in time-line views. Even more strikingly, trace-analysis tools may also cease to work in a satisfactorymanner if they rely on the correct order to function properly. Although linear offset interpolation can restore the consistency of the trace data to some degree, time-dependent drifts and other inaccuracies may still disarrange the original sequence of events, as shown in a study conducted as a part of this Ph.D. thesis. The already familiar controlled logical clock algorithm accounts for such violations in point-to-point communication by shifting message events in time as much as needed while trying to preserve the length of local intervals. This algorithm is, however, not suitable for realistic applications because (i) it ignores collective and shared-memory operations and (ii) as a serial algorithm it offers only limited scalability. This thesis addresses these shortcomings by extending the algorithm to restore event semantics related to collective and shared-memory operations and by parallelizing the extended version to make it suitable for large-scale systems including computational grids. The basic idea behind the semantic extension is to consider collective and shared-memory operations as being composed of multiple point-to-point messages, taking the semantics of the different flavors of these operations into account. In order to accomplish the correction in a scalable way, both distributed memory and parallel processing capabilities are exploited by processing separate local trace files in parallel and replaying the original communication on as many CPUs as were used to execute the target application itself. To employ the replay mechanism in computational grids, this work also defines the necessary infrastructure to accurately measure clock offsets in distributed environments with hierarchical networks. The methodology was evaluated in practice by integrating the extended and parallelized algorithm into the Scalasca trace-analysis framework and applied to traces of realistic applications taken on single cluster systems and computational grids. The thesis shows that the algorithm eliminates inconsistent timings of concurrent events while onlymarginally changing the length of intervals between local events – even if wide-area communication is involved. Scalability is demonstrated with up to 4,096 application processes.
915			\|0 StatID:(DE-HGF)0510 \|2 StatID \|a OpenAccess
914	1		\|y 2010
502			\|a RWTH Aachen, Diss., 2010 \|b Dr. (FH) \|c RWTH Aachen \|d 2010
500			\|a Record converted from VDB: 12.11.2012
300			\|a XVIII, 116 S.
245			\|a Timestamp Synchronization of Concurrent Events
024	7		\|2 Handle \|a 2128/3787
024	7		\|2 URI \|a 3787
024	7		\|2 ISSN \|a 1868-8489
260			\|a Jülich \|b Forschungszentrum Jülich GmbH Zentralbibliothek, Verlag \|c 2010
020			\|a 978-3-89336-625-5
100	1		\|0 P:(DE-Juel1)VDB62975 \|a Becker, Daniel \|b 0 \|e Corresponding author \|g male \|u FZJ
909	C	O	\|o oai:juser.fz-juelich.de:10841 \|p openaire \|p open_access \|p driver \|p VDB \|p dnbdelivery
913	2		\|0 G:(DE-HGF)POF3-511 \|1 G:(DE-HGF)POF3-510 \|2 G:(DE-HGF)POF3-500 \|a DE-HGF \|b Key Technologies \|l Supercomputing & Big Data \|v Computational Science and Mathematical Methods \|x 0
913	1		\|0 G:(DE-HGF)POF2-411 \|1 G:(DE-HGF)POF2-410 \|2 G:(DE-HGF)POF2-400 \|a DE-HGF \|b Schlüsseltechnologien \|l Supercomputing \|v Computational Science and Mathematical Methods \|x 1 \|4 G:(DE-HGF)POF \|3 G:(DE-HGF)POF2
536			\|a Computational Science and Mathematical Methods \|0 G:(DE-HGF)POF2-411 \|c POF2-411 \|f POF II \|x 1
536			\|a Scientific Computing \|0 G:(DE-Juel1)FUEK411 \|2 G:(DE-HGF) \|x 0 \|c FUEK411
336			\|2 DRIVER \|a doctoralThesis
336			\|a PSP 2.9.2 Magnets \|0 2 \|2 EndNote
336			\|a Book \|0 PUB:(DE-HGF)3 \|2 PUB:(DE-HGF)
336			\|a Dissertation / PhD Thesis \|0 PUB:(DE-HGF)11 \|2 PUB:(DE-HGF)
336			\|2 ORCID \|a DISSERTATION
336			\|2 DataCite \|a Output Types/Dissertation
336			\|2 BibTeX \|a PHDTHESIS
920			\|l yes
920			\|k Jülich Supercomputing Center; JSC \|0 I:(DE-Juel1)JSC-20090406 \|g JSC \|l Jülich Supercomputing Centre \|x 0
990			\|a Becker, D. \|0 P:(DE-Juel1)VDB62975 \|b 0 \|e Corresponding author \|g male \|u FZJ