Description: COMPESCE: A Co-design Approach for Memory Subsystem Performance Analysis in HPC Many-Cores

This title appears in the Scientific Report : 2023

COMPESCE: A Co-design Approach for Memory Subsystem Performance Analysis in HPC Many-Cores

This paper explores the memory subsystem design through gem5 simulations of a non-uniform memory access (NUMA) architecture with ARM cores equipped with vector engines. And connected to a Network-on-Chip (NoC) following the Coherent Hub Interface (CHI) protocol. The study quantifies the benefits of...

Personal Name(s):	Portero, Antonio (Corresponding author)
	Falquez, Carlos / Ho, Nam / Petrakis, Polydoros / Nassyr, Stepan / Marazakis, Manolis / Dolbeau, Romain / Nocua Cifuentes, Jorge A. / Beltran, Luis / Pleiter, Dirk / Suarez, Estela
Contributing Institute:	Institute for Advanced Simulation; IAS Jülich Supercomputing Center; JSC
Published in:	Architecture of Computing Systems - 36th International Conference
Imprint:	Cham Springer Nature Switzerland 2023
Physical Description:	105-119
ISBN:	978-3-031-42784-8 978-3-031-42785-5 (electronic)
DOI:	10.1007/978-3-031-42785-5_8
DOI:	10.34734/FZJ-2023-05391
Conference:	Architecture of Computing Systems - 36th International Conference, Athens (Greece), 2023-06-13 - 2023-06-15
Document Type:	Contribution to a book Contribution to a conference proceedings
Research Program:	SGA1 (Specific Grant Agreement 1) OF THE EUROPEAN PROCESSOR INITIATIVE (EPI) Future Computing & Big Data Systems
Series Title:	Lecture Notes in Computer Science 13949
Link:	Get full text OpenAccess
	Publikationsportal JuSER

Please use the identifier: http://dx.doi.org/10.1007/978-3-031-42785-5_8 in citations.
Please use the identifier: http://dx.doi.org/10.34734/FZJ-2023-05391 in citations.

This paper explores the memory subsystem design through gem5 simulations of a non-uniform memory access (NUMA) architecture with ARM cores equipped with vector engines. And connected to a Network-on-Chip (NoC) following the Coherent Hub Interface (CHI) protocol. The study quantifies the benefits of vectorization, prefetching, and multichannel NoC configurations using a benchmark for generating memory patterns and indexed accesses. The outcomes provide insights into improving bus utilization and bandwidth and reducing stalls in the system. The paper proposes hardware/software (HW/SW) advancements to reach and use the HBM device with a higher percentage than 80% at the memory controllers in the simulated manycore system.