Description: Supporting Software Engineering Practices in the Development of Data-Intensive HPC Applications with the JuML Framework

This title appears in the Scientific Report : 2017

Supporting Software Engineering Practices in the Development of Data-Intensive HPC Applications with the JuML Framework

The development of high performance computing applications is considerably different from traditional software development. This distinction is due to the complex hardware systems, inherent parallelism, different software lifecycle and workflow, as well as (especially for scientific computing appli...

Personal Name(s):	Götz, Markus (Corresponding author)
	Book, Matthias / Bodenstein, Christian / Riedel, Morris
Contributing Institute:	Jülich Supercomputing Center; JSC
Published in:	Proceedings of the 1st International Workshop on Software Engineering for High Performance Computing in Computational and Data-enabled Science & Engineering
Imprint:	ACM Press 2017
Physical Description:	1-8
DOI:	10.1145/3144763.3144765
Conference:	Workshop on Software Engineering for High Performance Computing in Computational and Data-enabled Science & Engineering, Denver (USA), 2017-11-12 - 2017-11-17
Document Type:	Contribution to a book Contribution to a conference proceedings
Research Program:	DEEP - Extreme Scale Technologies Doktorand ohne besondere Förderung Data-Intensive Science and Federated Computing
Link:	Restricted Restricted OpenAccess
	Publikationsportal JuSER

Please use the identifier: http://hdl.handle.net/2128/26302 in citations.
Please use the identifier: http://dx.doi.org/10.1145/3144763.3144765 in citations.

The development of high performance computing applications is considerably different from traditional software development. This distinction is due to the complex hardware systems, inherent parallelism, different software lifecycle and workflow, as well as (especially for scientific computing applications) partially unknown requirements at design time. This makes the use of software engineering practices challenging, so only a small subset of them are actually applied. In this paper, we discuss the potential for applying software engineering techniques to an emerging field in high performance computing, namely large-scale data analysis and machine learning. We argue for the employment of software engineering techniques in the development of such applications from the start, and the design of generic, reusable components. Using the example of the Juelich Machine Learning Library (JuML), we demonstrate how such a framework can not only simplify the design of new parallel algorithms, but also increase the productivity of the actual data analysis workflow. We place particular focus on the abstraction from heterogeneous hardware, the architectural design as well as aspects of parallel and distributed unit testing.