Description: Scalable and parallel machine learning algorithms for statistical data mining

This title appears in the Scientific Report : 2015

Scalable and parallel machine learning algorithms for statistical data mining - Practice & experience

Many scientific datasets (e.g. earth sciences, medical sciences, etc.) increase with respect to their volume or in terms of their dimensions due to the ever increasing quality of measurement devices. This contribution will specifically focus on how these datasets can take advantage of new `big data&...

Personal Name(s):	Riedel, Morris (Corresponding author)
	Goetz, M. / Richerzhagen, M. / Glock, P. / Bodenstein, C. / Memon, Ahmed / Memon, Mohammad Shahbaz
Contributing Institute:	Jülich Supercomputing Center; JSC
Published in:	2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO) : [Proceedings] - IEEE, 2015. - ISBN 978-9-5323-3082-3 -
Imprint:	IEEE 2015
Physical Description:	204 - 209
DOI:	10.1109/MIPRO.2015.7160265
Conference:	38th International Convention on Information and Communication Technology, Electronics and Microelectronics, Opatija (Croatia), 2015-05-25 - 2015-05-29
Document Type:	Contribution to a conference proceedings
Research Program:	Data-Intensive Science and Federated Computing
	Publikationsportal JuSER

Please use the identifier: http://dx.doi.org/10.1109/MIPRO.2015.7160265 in citations.

Many scientific datasets (e.g. earth sciences, medical sciences, etc.) increase with respect to their volume or in terms of their dimensions due to the ever increasing quality of measurement devices. This contribution will specifically focus on how these datasets can take advantage of new `big data' technologies and frameworks that often are based on parallelization methods. Lessons learned with medical and earth science data applications that require parallel clustering and classification techniques such as support vector machines (SVMs) and density-based spatial clustering of applications with noise (DBSCAN) are a substantial part of the contribution. In addition, selected experiences of related `big data' approaches and concrete mining techniques (e.g. dimensionality reduction, feature selection, and extraction methods) will be addressed too. In order to overcome identified challenges, we outline an architecture framework design that we implement with open available tools in order to enable scalable and parallel machine learning applications in distributed systems.