This title appears in the Scientific Report : 2015 

Portable Node-Level Performance Optimization for the Fast Multipole Method
Beckmann, Andreas (Corresponding author)
Kabadshow, Ivo
Jülich Supercomputing Center; JSC
Recent Trends in Computational Engineering - CE2014
Cham Springer International Publishing 2015
29 - 46
978-3-319-22996-6
978-3-319-22997-3 (electronic)
10.1007/978-3-319-22997-3_2
3rd International Workshop on Computational Engineering, Stuttgart (Germany), 2014-10-06 - 2014-10-10
Contribution to a conference proceedings
Highly Scalable Unified Long-Range Electrostatics and Flexible Ionization for Realistic Biomolecular Simulations on the Exascale
Computational Science and Mathematical Methods
Lecture Notes in Computational Science and Engineering 105
Please use the identifier: http://dx.doi.org/10.1007/978-3-319-22997-3_2 in citations.
This article provides an in-depth analysis and high-level C++ optimization strategies for the most time-consuming kernels of a Fast Multipole Method (FMM). The two main kernels of a Coulomb FMM are formulated to support different hardware features, such as unrolling, vectorization or threading without the need to rewrite the kernels in intrinsics or even assembly. The abstract description of the algorithm automatically allows optimal node-level peak performance on a broad class of available hardware platforms. Most of the presented optimization schemes allow a generic, hence platform-independent description for other kernels as well.