This title appears in the Scientific Report :
2017
Please use the identifier:
http://hdl.handle.net/2128/15949 in citations.
Modular Supercomputing: the DEEP approach to hardware heterogeneity
Modular Supercomputing: the DEEP approach to hardware heterogeneity
The way in which HPC systems are built has changed over the decades. Originally, special purpose components were designed to build unique systems tailored to the specific requirements of a given user community. Although this approach is still continued to a given extent, currently the large majority...
Saved in:
Personal Name(s): | Suarez, Estela (Corresponding author) |
---|---|
Contributing Institute: |
Jülich Supercomputing Center; JSC |
Imprint: |
2017
|
Conference: | 8th International Supercomputing Conference In Mexico, Guadalajara (Mexico), 2017-02-27 - 2017-03-03 |
Document Type: |
Conference Presentation |
Research Program: |
DEEP - Extreme Scale Technologies DEEP Extended Reach Dynamical Exascale Entry Platform Supercomputer Facility |
Link: |
OpenAccess OpenAccess |
Publikationsportal JuSER |
The way in which HPC systems are built has changed over the decades. Originally, special purpose components were designed to build unique systems tailored to the specific requirements of a given user community. Although this approach is still continued to a given extent, currently the large majority of HPC systems are clusters: systems made out of standard components connected to each other through high-speed networks. The main reason for the popularity of cluster systems is their lower cost. To reduce also their overall energy consumption, the trend goes in the recent years into building systems more and more heterogeneous, combining general purpose CPUs with GPGPUs or many-core processors at the node level.The European funded projects DEEP (www.deep-project.eu) and DEEP-ER (www.deep-er.eu) proposed and demonstrated an alternative approach in which heterogeneity is introduced not at the node, but at the system level. The “Cluster-Booster” architecture connects a standard, homogeneous Xeon cluster with a “Booster”: a cluster made entirely of many-core devices. The “DEEP System” is the first hardware prototype of this concept: a 500 TFlop/s peak performance prototype constituted by a 128-node Xeon Cluster and a 384-node Xeon Phi (KNC) Booster. The newer “DEEP-ER Prototype” updates the processor technology on both the Cluster and Booster side (now with KNL) and introduces a fully new memory hierarchy based on Non-Volatile technology to support new I/O and checkpointing capabilities.But both DEEP and DEEP-ER do much more than just building a hardware prototype. A complete software stack has been created to make the prototypes operational at production level. Focus of the software development was to provide the best performance, hide the hardware complexity from the users, and ease their task of porting application code to the new architecture. For this purpose, a programming environment based on de-facto standard components (MPI+OpenMP) has been adapted and extended with new offloading functionalities. In this way, applications can easily run on the systems keeping the codes fully portable. To make sure that both the hardware and software functionalities of the DEEP and DEEP-ER projects fulfill the requirements of HPC applications, all the project developments have been guided via co-design by a varied portfolio of 11 applications, which were also adapted to the architecture to demonstrate its potential.The DEEP and DEEP-ER projects laid down the first milestones in an architecture approach leading to what we call “Modular Supercomputer Architecture”. In this concept, compute modules (each one a parallel cluster system of potentially large size) with different performance characteristics are integrated into a single heterogeneous system, so that applications run distributed over the various modules depending on the kinds of resources that they need. This approach is ideal for supercomputer centers running heterogeneous application mixes and offers valuable flexibility to the compute providers, allowing the set of modules and their respective size to be tailored to actual usage. |