This title appears in the Scientific Report :
2023
Please use the identifier:
http://dx.doi.org/10.34734/FZJ-2023-04901 in citations.
Please use the identifier: http://dx.doi.org/10.1038/s42003-023-05244-9 in citations.
RNA contact prediction by data efficient deep learning
RNA contact prediction by data efficient deep learning
On the path to full understanding of the structure-function relationship or even design of RNA, structure prediction would offer an intriguing complement to experimental efforts. Any deep learning on RNA structure, however, is hampered by the sparsity of labeled training data. Utilizing the limited...
Saved in:
Personal Name(s): | Taubert, Oskar |
---|---|
von der Lehr, Fabrice / Bazarova, Alina / Faber, Christian / Knechtges, Philipp / Weiel, Marie / Debus, Charlotte / Coquelin, Daniel / Basermann, Achim / Streit, Achim / Kesselheim, Stefan / Götz, Markus / Schug, Alexander (Corresponding author) | |
Contributing Institute: |
Jülich Supercomputing Center; JSC |
Published in: | Communications biology, 6 (2023) 1, S. 913 |
Imprint: |
London
Springer Nature
2023
|
DOI: |
10.34734/FZJ-2023-04901 |
DOI: |
10.1038/s42003-023-05244-9 |
Document Type: |
Journal Article |
Research Program: |
Helmholtz AI Consultant Team FB Information Helmholtz Analytics Framework Domain-Specific Simulation & Data Life Cycle Labs (SDLs) and Research Groups Cross-Domain Algorithms, Tools, Methods Labs (ATMLs) and Research Groups |
Link: |
OpenAccess |
Publikationsportal JuSER |
Please use the identifier: http://dx.doi.org/10.1038/s42003-023-05244-9 in citations.
On the path to full understanding of the structure-function relationship or even design of RNA, structure prediction would offer an intriguing complement to experimental efforts. Any deep learning on RNA structure, however, is hampered by the sparsity of labeled training data. Utilizing the limited data available, we here focus on predicting spatial adjacencies ("contact maps”) as a proxy for 3D structure. Our model, BARNACLE, combines the utilization of unlabeled data through self-supervised pre-training and efficient use of the sparse labeled data through an XGBoost classifier. BARNACLE shows a considerable improvement over both the established classical baseline and a deep neural network. In order to demonstrate that our approach can be applied to tasks with similar data constraints, we show that our findings generalize to the related setting of accessible surface area prediction. |