This title appears in the Scientific Report :
2015
Please use the identifier:
http://dx.doi.org/10.1109/MIPRO.2015.7160265 in citations.
Scalable and parallel machine learning algorithms for statistical data mining - Practice & experience
Scalable and parallel machine learning algorithms for statistical data mining - Practice & experience
Many scientific datasets (e.g. earth sciences, medical sciences, etc.) increase with respect to their volume or in terms of their dimensions due to the ever increasing quality of measurement devices. This contribution will specifically focus on how these datasets can take advantage of new `big data&...
Saved in:
Personal Name(s): | Riedel, Morris (Corresponding author) |
---|---|
Goetz, M. / Richerzhagen, M. / Glock, P. / Bodenstein, C. / Memon, Ahmed / Memon, Mohammad Shahbaz | |
Contributing Institute: |
Jülich Supercomputing Center; JSC |
Published in: |
2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO) : [Proceedings] - IEEE, 2015. - ISBN 978-9-5323-3082-3 - |
Imprint: |
IEEE
2015
|
Physical Description: |
204 - 209 |
DOI: |
10.1109/MIPRO.2015.7160265 |
Conference: | 38th International Convention on Information and Communication Technology, Electronics and Microelectronics, Opatija (Croatia), 2015-05-25 - 2015-05-29 |
Document Type: |
Contribution to a conference proceedings |
Research Program: |
Data-Intensive Science and Federated Computing |
Publikationsportal JuSER |
Many scientific datasets (e.g. earth sciences, medical sciences, etc.) increase with respect to their volume or in terms of their dimensions due to the ever increasing quality of measurement devices. This contribution will specifically focus on how these datasets can take advantage of new `big data' technologies and frameworks that often are based on parallelization methods. Lessons learned with medical and earth science data applications that require parallel clustering and classification techniques such as support vector machines (SVMs) and density-based spatial clustering of applications with noise (DBSCAN) are a substantial part of the contribution. In addition, selected experiences of related `big data' approaches and concrete mining techniques (e.g. dimensionality reduction, feature selection, and extraction methods) will be addressed too. In order to overcome identified challenges, we outline an architecture framework design that we implement with open available tools in order to enable scalable and parallel machine learning applications in distributed systems. |