Description: FAIRification of electrophysiology data analysis: provenance capture in the Elephant toolbox

This title appears in the Scientific Report : 2021

FAIRification of electrophysiology data analysis: provenance capture in the Elephant toolbox

The analysis of electrophysiology data typically comprises multiple steps. These often consist of several scripts executed in a specific temporal order, which take different parameter sets and use distinct data files. As the researcher adjusts the individual analysis steps to accommodate new hypothe...

Personal Name(s):	Köhler, Cristiano (Corresponding author)
Contributing Institute:	Computational and Systems Neuroscience; INM-6 Computational and Systems Neuroscience; IAS-6 Jara-Institut Brain structure-function relationships; INM-10
Imprint:	2021
Conference:	INCF Neuroinformatics Assembly 2021, online (online),
Document Type:	Talk (non-conference)
Research Program:	Helmholtz School for Data Science in Life, Earth and Energy (HDS LEE) Theory, modelling and simulation Connectivity and Activity Neuroscientific Foundations Digitization of Neuroscience and User-Community Building Helmholtz Analytics Framework Human Brain Project Specific Grant Agreement 3 Human Brain Project Specific Grant Agreement 2
	Publikationsportal JuSER

The analysis of electrophysiology data typically comprises multiple steps. These often consist of several scripts executed in a specific temporal order, which take different parameter sets and use distinct data files. As the researcher adjusts the individual analysis steps to accommodate new hypotheses or additional data, the resulting workflows may become increasingly complex, and undergo frequent changes. Although it is possible to use workflow management systems to organize the execution of the scripts and capture provenance information at the level of the script (i.e., which script file was executed, and in which environment?) and data file (i.e., which input and output files were supplied to that script), the resulting provenance track does not automatically provide details about the actual analysis carried out inside each script. Therefore, the final analysis results can only be understood by source code inspection or reliance in any accompanying documentation. We focus on two open-source tools for the analysis of electrophysiology data developed in EBRAINS. The Neo (RRID:SCR_000634) framework provides an object model to standardize neural activity data acquired from distinct sources. Elephant (RRID:SCR_003833) is a Python toolbox that provides several functions for the analysis of electrophysiology data. We set to improve these tools by implementing a data model that captures detailed provenance information and by representing the analysis results in a systematic and formalized manner. Ultimately, these developments aim to improve reproducibility, interoperability, findability, and re-use of analysis results.