This title appears in the Scientific Report :
2022
Please use the identifier:
http://dx.doi.org/10.1016/j.parco.2022.102952 in citations.
Please use the identifier: http://hdl.handle.net/2128/31621 in citations.
Routing brain traffic through the von Neumann bottleneck: Efficient cache usage in spiking neural network simulation code on general purpose computers
Routing brain traffic through the von Neumann bottleneck: Efficient cache usage in spiking neural network simulation code on general purpose computers
Simulation is a third pillar next to experiment and theory in the study of complex dynamic systems such as biological neural networks. Contemporary brain-scale networks correspond to directed random graphs of a few million nodes, each with an in-degree and out-degree of several thousands of edges, w...
Saved in:
Please use the identifier: http://hdl.handle.net/2128/31621 in citations.
Simulation is a third pillar next to experiment and theory in the study of complex dynamic systems such as biological neural networks. Contemporary brain-scale networks correspond to directed random graphs of a few million nodes, each with an in-degree and out-degree of several thousands of edges, where nodes and edges correspond to the fundamental biological units, neurons and synapses, respectively. The activity in neuronal networks is also sparse. Each neuron occasionally transmits a brief signal, called spike, via its outgoing synapses to the corresponding target neurons. In distributed computing these targets are scattered across thousands of parallel processes. The spatial and temporal sparsity represents an inherent bottleneck for simulations on conventional computers: irregular memory-access patterns cause poor cache utilization. Using an established neuronal network simulation code as a reference implementation, we investigate how common techniques to recover cache performance such as software-induced prefetching and software pipelining can benefit a real-world application. The algorithmic changes reduce simulation time by up to 50%. The study exemplifies that many-core systems assigned with an intrinsically parallel computational problem can alleviate the von Neumann bottleneck of conventional computer architectures. |