This title appears in the Scientific Report :
2016
Please use the identifier:
http://hdl.handle.net/2128/12206 in citations.
Please use the identifier: http://dx.doi.org/10.12694/scpe.v17i2.1160 in citations.
Enabling Scalable Data Processing and Management through Standards-based Job Execution and the Global Federated File System
Enabling Scalable Data Processing and Management through Standards-based Job Execution and the Global Federated File System
Emerging challenges for scientific communities are to efficiently process big data obtained by experimentation and computational simulations. Supercomputing architectures are available to support scalable and high performant processing environment, but many of the existing algorithm implementations...
Saved in:
Personal Name(s): | Memon, Mohammad Shahbaz (Corresponding author) |
---|---|
Riedel, Morris (Corresponding author) / Memon, Ahmed / Koeritz, Chris / Grimshaw, Andrew / Neukirchen, Helmut | |
Contributing Institute: |
Jülich Supercomputing Center; JSC |
Published in: | Scalable computing, 17 (2016) 2, S. 115-128 |
Imprint: |
2016
|
DOI: |
10.12694/scpe.v17i2.1160 |
Document Type: |
Journal Article |
Research Program: |
Data-Intensive Science and Federated Computing |
Link: |
OpenAccess OpenAccess |
Publikationsportal JuSER |
Please use the identifier: http://dx.doi.org/10.12694/scpe.v17i2.1160 in citations.
Emerging challenges for scientific communities are to efficiently process big data obtained by experimentation and computational simulations. Supercomputing architectures are available to support scalable and high performant processing environment, but many of the existing algorithm implementations are still unable to cope with its architectural complexity. One approach is to have innovative technologies that effectively use these resources and also deal with geographically dispersed large datasets. Those technologies should be accessible in a way that data scientists who are running data intensive computations do not have to deal with technical intricacies of the underling execution system. Our work primarily focuses on providing data scientists with transparent access to these resources in order to easily analyze data. Impact of our work is given by describing how we enabled access to multiple high performance computing resources through an open standards-based middleware that takes advantage of a unified data management provided by the the Global Federated File System. Our architectural design and its associated implementation is validated by a usecase that requires massivley parallel DBSCAN outlier detection on a 3D point clouds dataset. |