This title appears in the Scientific Report :
2023
Please use the identifier:
http://dx.doi.org/10.34734/FZJ-2023-05235 in citations.
Please use the identifier: http://dx.doi.org/10.5281/ZENODO.8355962 in citations.
Distributed data management for large collaborative projects: DataLad ecosystem in Collaborative Research Center 1451
Distributed data management for large collaborative projects: DataLad ecosystem in Collaborative Research Center 1451
Multi-site research projects offer a unique opportunity for scientific insight based on data collected across different modalities, paradigms, and species. Yet, they also pose unique research data management challenges. Here, we present software developments and lessons learned from the information...
Saved in:
Personal Name(s): | Szczepanik, Michał (Corresponding author) |
---|---|
Heunis, Stephan / Mönch, Christian / Wagner, Adina / Waite, Alexander Q. / Waite, Laura / Hanke, Michael | |
Contributing Institute: |
Gehirn & Verhalten; INM-7 |
Imprint: |
2023
|
DOI: |
10.34734/FZJ-2023-05235 |
DOI: |
10.5281/ZENODO.8355962 |
Conference: | INCF Neuroinformatics Assembly 2023, online (Sweden), 2023-09-18 - 2023-09-20 |
Document Type: |
Poster |
Research Program: |
Datenmanagement für computergestützte Modellierung (INF) Neuroscientific Data Analytics and AI |
Link: |
OpenAccess |
Publikationsportal JuSER |
Please use the identifier: http://dx.doi.org/10.5281/ZENODO.8355962 in citations.
Multi-site research projects offer a unique opportunity for scientific insight based on data collected across different modalities, paradigms, and species. Yet, they also pose unique research data management challenges. Here, we present software developments and lessons learned from the information management project of CRC1451. Given the large variability of RDM demands across over 20 CRC member projects, we opted for a decentralized approach: Projects retain full control over key data management decisions (standards, storage, sharing), and the findability, accessibility, interoperability, and reusability of their data is achieved with DataLad as an overlay structure for all distributed datasets. We use DataLad Catalog to generate an online data portal based on metadata. Metadata extraction is done using MetaLad, based on the 'capture immediately, curate perpetually' iterative approach. To mitigate DataLad’s limited adoption outside central projects, we are developing two solutions. First, DataLad Gooey is a graphical user interface for basic data management operations. Second, DataLad Tabby is a format specification and a collection of tools for dataset descriptions which can be created and provided as a spreadsheet, using well-defined terms, translatable to catalog records and linked data objects. |