Description: Cray Scientific Library (libsci) and Parallelism

Cray Scientific Library (libsci) and Parallelism - Some Performance Aspects -

Today, most of the Cray multiprocessor systems are still used within a multiprogramming environment. In such environments, there are two main issues contributing to whether or not production codes should exploit parallelism. Firstly, in terms of turnaround time, the parallel program should run faste...

Personal Name(s):	Jansen, Paul (Corresponding Author)
	Marx, Monika / Nagel, Wolfgang E. / Romberg, Mathilde / Vaeßen, Marga / Zimmermann, Ruth
Contributing Institute:	Zentralinstitut für Angewandte Mathematik; ZAM Jülich Supercomputing Center; JSC
Published in:	1993
Imprint:	Jülich Zentralinstitut für Angewandte Mathematik 1993
Physical Description:	28 p.
Document Type:	Report
Research Program:	ohne Topic
	Publikationsportal JuSER

Today, most of the Cray multiprocessor systems are still used within a multiprogramming environment. In such environments, there are two main issues contributing to whether or not production codes should exploit parallelism. Firstly, in terms of turnaround time, the parallel program should run faster than the single-tasked program version, and secondly, the costs, i.e. CPU-time from the user's point of view as well as system throughput from the computer center's point of view, should remain reasonably constant.This becomes even more important if parallelism is introduced automatically by calling optimized library routines provided by the vendor. The Cray Scientific Library (libsci) is such an example: The routines have to be highly efficient because these kernels are often used in user codes as basic blocks to build more complex algorithms. Based on libsci 7.0, the fieldtest version libsci 8.0 and the revised libsci 8.005, this report in detail describes performance values obtained for some BLAS algorithms. As can be seen from our results, for the first two libsci versions, significant overhead (up to several hundred per cent) has been observed in many cases, also for large problem sizes. This fact was even more critical because many algorithms provided by third party libraries (i.e. NAG and IMSL) rely on libsci BLAS kernels. Under the UNICOS Rel. 8.0 operating system, the default value for the number of CPUs waiting in parallel was decreased from eight to four. This fact and some further optimizations in libsci 8.005 have mostly solved the problems and this libsci release is now the production version.