This title appears in the Scientific Report :
2015
Comparing and Evaluating Clustering Methods for Protein Simulations
Comparing and Evaluating Clustering Methods for Protein Simulations
Understanding protein folding is a prerequisite for understanding diseases like Alzheimer's, Parkinson's, Mad Cow, and many others. Simulations have contributed significantly to our knowledge of protein folding. The volume and complexity of data generated by these simulations, however, req...
Saved in:
Personal Name(s): | Meinke, Jan (Corresponding author) |
---|---|
Contributing Institute: |
Jülich Supercomputing Center; JSC |
Imprint: |
2015
|
Conference: | Scientific Computing with Python 2015, Austin (USA), 2015-07-06 - 2015-07-12 |
Document Type: |
Poster |
Research Program: |
Computational Science and Mathematical Methods |
Publikationsportal JuSER |
Understanding protein folding is a prerequisite for understanding diseases like Alzheimer's, Parkinson's, Mad Cow, and many others. Simulations have contributed significantly to our knowledge of protein folding. The volume and complexity of data generated by these simulations, however, require new ways to analyze the data. Clustering protein structures offers a way of projecting the data onto a managable set, but it is a difficult task. The large number of coordinates (dimensions), which may be periodic, make it algorithmically challenging, and the large number of structures makes it computationally challenging. A variety of partitional and hierarchical cluster algorithms are available in Python, e.g., in scipy, sklearn, and msmbuilder. Density-based algorithms can be found, e.g., in sklearn and individual packages such as pymafia. This talk presents a comparison of the quality, speed, and complexity of different clustering algorithms available in Python including the subspace clustering algorithm MAFIA for clustering protein structures. The quality of the clusters is evaluated in terms of similarity measures as well as physical properties of the structures within a cluster. Finally, an example of an analysis based on clustering of a Monte Carlo simulation of the folding of a small protein is presented. |