Description: Modular Science: Towards Online Multi Application Coordination on Inhomogeneous High Performance Computing and Neuromorphic Hardware Systems

This title appears in the Scientific Report : 2018

Modular Science: Towards Online Multi Application Coordination on Inhomogeneous High Performance Computing and Neuromorphic Hardware Systems

Neuroscience is an interdisciplinary field with collaborators from biology, medicine, physics, mathematics, computer scientists and engineers. The complexity of the brain indicates that only with the collaboration and integration of knowledge from diverse fields will we be able to gain significant t...

Personal Name(s):	Klijn, Wouter (Corresponding author)
	Diaz, Sandra / Morrison, Abigail / Schenck, Wolram / Weyers, Benjamin / Peyser, Alexander
Contributing Institute:	Jülich Supercomputing Center; JSC Computational and Systems Neuroscience; INM-6 JARA - HPC; JARA-HPC
Imprint:	2018
Conference:	27th Annual Computational Neuroscience Meeting, Seattle (USA), 2018-07-13 - 2018-07-18
Document Type:	Poster
Research Program:	SimLab Neuroscience Supercomputing and Modelling for the Human Brain Computational Science and Mathematical Methods
Link:	OpenAccess OpenAccess
	Publikationsportal JuSER

Please use the identifier: http://hdl.handle.net/2128/19474 in citations.

Neuroscience is an interdisciplinary field with collaborators from biology, medicine, physics, mathematics, computer scientists and engineers. The complexity of the brain indicates that only with the collaboration and integration of knowledge from diverse fields will we be able to gain significant traction for a better understanding of its structure and function. This reality also makes neuroscience a place where specialized tools to solve computationally demanding problems at different scales play a fundamental role. Supercomputers are currently an important tool for simulation and data processing in most areas of science. The need to study the brain using multiple specialized software is giving rise to complex workflows which also require expert monitoring and interaction. High-performance computing (HPC) workflows in neuroscience are already important in several subdomains, including:1. Image processing: requiring big data processing/storing, automatic brain tissue segmentation, identification,quality control and reconstruction.2. Brain simulation: from molecules to neurons, networks and brain regions, simulations of the brain constitute a source for the analysis and generation of new hypothesis regarding the roles of connectivity, topology, morphology and communication in function and cognition.3. Visualization: the translation from data to information is one of the most important steps in a workflow,as it allows the scientist to quickly assess the success of the analysis.4. Data storage: from high resolution images to millions of spiking events produced per second, data storage is one of the biggest bottlenecks to solve in HPC neuroscientific workflows.5. Large parameter space exploration: our models are imperfect, the amount of experimental data that we have access to for the brain is too small to constrain most models, and in general deriving parameters for dynamical systems is computationally intractable. This forces us to make large parameter searches in order to fit our models to what we measure in the laboratories. Parameter spaces can be so large that a search for meaningful combinations becomes intractable, requiring adaptive and efficient algorithms to guide these explorations. Interactive monitoring by an expert may also be desirable in many of these complex searches. Such interactive supercomputing [1] enables scientists to gain more insight on the impact that each element in the model has on the observed outcome — and is a goal of both the Human Brain Project and exascale computing plans around the world. The live execution of workflow pipelines is desirable due to the large amount of intermediate generated data with inefficient or even intractable storage requirements, the need for expert human graphical interaction with the systems and interaction with multiscale systems working at different time scales or with systems in contact with experiments.Launching applications on HPC systems such as clusters and supercomputers requires interaction with scheduling systems, setting up software and configuring working environments. A successful deployment of a job in a supercomputer (the execution of a defined set of instructions) depends on the accurate definition of paths to libraries, access to data input/output, availability of required computational resources, correct definition of the job within the limits imposed by the scheduler and the correct execution of the instructions enclosed in the job.Each of these dependencies can be a source of problems which prevent the job execution, requiring the application neuroscientist to debug the pipeline. If this job is now part of a complex workflow with increasing numbers of software and hardware components, the dependencies multiply and the potential for failure increases.In order to provide the scientific community new tools to allow the reliable and efficient execution of complex and interactive workflows, we have conceptualized “Modular Science”. The Modular Science workflow is a software and social interaction contract for the deployment of complex scientific workflows on supercomputers. It can be seen as an orchestrator for scientific applications. First, it is a software contract because it defines the interfaces must be packed, described and shared between different steps and it defines the operative limitations of each job. On the other hand it is also a social contract because it helps scientists and engineers agree on formats,infrastructure, environment setups, limitations and desired behavior of these workflows. It is not trivial to orchestrate the harmonious execution of HPC software which usually was developed by different partners and has multiple applications. The modular science orchestrator (a software component, Figure 1) serves as a base from which each scientific domain can develop agreements on how to better work and exploit the available computational resources, minimizing the risk of failure on these complex workflows and enabling new science. The nature of the framework also allows the monitoring of basic variables within these workflows,such as data bandwidth, memory consumption and error tracking through consolidated logging. A diagram of the relationships between different applications and the modular science orchestrator can be seen in Figure 1. The modular science orchestrator considers the execution of a scientific workflow as a staged process in order to enhance its robustness and probability of success with minimal effort and loss of resources. For this purpose the orchestrator, making use of envelope software which adheres and interacts to each job in the workflow, tests the critical dependencies for execution of each job independently before deployment. Then, it tests for critical shared dependencies among two or more elements in the workflow. It tests for connection channels, software libraries,input/output paths and sources, privileges and correct job configuration according to the limitations imposed bythe specific scheduler in the supercomputer of choice.Once these dependencies have been tested in a single node, a second stage with all required resources allocated starts. Again dependencies are verified for large scale deployment and a final green light is given for the execution of the full workflow. If something is not right in the test stages, the deployment is canceled, notifying the user and saving computing resources. This concept is illustrated in Figure 1 for two applications in a workflow.In this work we present the Modular Science framework and the concrete set of use cases which are guiding its development. A generic mapping of these workflows into our framework can be seen in Figure 3. Our first use case includes the interactive generation of neural network models using connectivity based on experimental data and executed on the NEST [2] simulator. Our second use case considers the setup of a simulation executed in NEST, interacting with a simulation in Arbor [3]. The output is processed in Elephant [4] and also used to calculate local field potentials [5]. The third use case encompasses the generation of a full brain simulation using TVB [6] neural mass models, based on connectivity from DTI experiments.Simulation results must be compared against functional experimental data to iteratively refine the model. The process is observed by an expert who interactively controls the parameter optimization process by observing the results and evaluating the fitness of each simulation instance.In our fourth use case, we plan to enable a full brain simulation using neural mass models at the global scope but which is coupled to local representations of specific regions simulated in NEST. Firing rate output from the neural mass models is translated into spiking input for the detailed neuron scale simulations. The output of the NEST simulation is afterwards processed using Elephant for further analysis.The development of the Modular Science framework is at an early stage; here we present results using the initial proof of concept in a multiscale simulation workflow as deployed on the Jülich Supercomputing Centre’s infrastructure. Our framework is open source and deployable in architectures from local laboratory clusters to supercomputing centers and compatible with most scheduling systems. By doing this, we will not only benefit the neuroscience community but, with little effort, our framework can be also used in other fields. This framework will support the reproducibility of complex scientific workflows and a more robust but efficient usage of available HPC and data storage resources.We aim at providing the neuroscientific community a new way to interact, share and exploit the capacity of specialized software ON HPC in a consistent framework. This framework will support the reproducibility of large workflows and robust but efficient usage of HPC and data storage resources. Such a framework will be crucial to attack new, large scale neuroscientific problems requiring complex/multiscale workflows combining pluggable simulators and analytical tools. References[1] Thomas Lippert and Boris Orth. Supercomputing infrastructure for simulations of the human brain.