This title appears in the Scientific Report :
2015
Please use the identifier:
http://hdl.handle.net/2128/9723 in citations.
Please use the identifier: http://dx.doi.org/10.1186/s12859-015-0714-x in citations.
Scientific Workflow Optimization for Improved Peptide and Protein Identification
Scientific Workflow Optimization for Improved Peptide and Protein Identification
Background: Peptide-spectrum matching is a common step in most data processing workflows for massspectrometry-based proteomics. Many algorithms and software packages, both free and commercial, have beendeveloped to address this task. However, these algorithms typically require the user to select ins...
Saved in:
Personal Name(s): | Holl, Sonja |
---|---|
Mohammed, Yassene / Zimmermann, Olav / Palmblad, Magnus (Corresponding author) | |
Contributing Institute: |
Jülich Supercomputing Center; JSC |
Published in: | BMC bioinformatics, 16 (2015) S. 284 |
Imprint: |
London
BioMed Central
2015
|
DOI: |
10.1186/s12859-015-0714-x |
PubMed ID: |
26335531 |
Document Type: |
Journal Article |
Research Program: |
Data-Intensive Science and Federated Computing Computational Science and Mathematical Methods |
Link: |
OpenAccess OpenAccess |
Publikationsportal JuSER |
Please use the identifier: http://dx.doi.org/10.1186/s12859-015-0714-x in citations.
Background: Peptide-spectrum matching is a common step in most data processing workflows for massspectrometry-based proteomics. Many algorithms and software packages, both free and commercial, have beendeveloped to address this task. However, these algorithms typically require the user to select instrument- andsample-dependent parameters, such as mass measurement error tolerances and number of missed enzymaticcleavages. In order to select the best algorithm and parameter set for a particular dataset, in-depth knowledgeabout the data as well as the algorithms themselves is needed. Most researchers therefore tend to use defaultparameters, which are not necessarily optimal.Results: We have applied a new optimization framework for the Taverna scientific workflow management system(http://ms-utils.org/Taverna_Optimization.pdf) to find the best combination of parameters for a given scientificworkflow to perform peptide-spectrum matching. The optimizations themselves are non-trivial, as demonstrated byseveral phenomena that can be observed when allowing for larger mass measurement errors in sequence databasesearches. On-the-fly parameter optimization embedded in scientific workflow management systems enables expertsand non-experts alike to extract the maximum amount of information from the data. The same workflows could beused for exploring the parameter space and compare algorithms, not only for peptide-spectrum matching, but alsofor other tasks, such as retention time prediction.Conclusion: Using the optimization framework, we were able to learn about how the data was acquired as well asthe explored algorithms. We observed a phenomenon identifying many ammonia-loss b-ion spectra as peptideswith N-terminal pyroglutamate and a large precursor mass measurement error. These insights could only be gainedwith the extension of the common range for the mass measurement error tolerance parameters explored by theoptimization framework. |