|
|
 |
|
SC Conference - Activity Details
A Massively Parallel Adaptive Fast-Multipole Method on Heterogeneous Architectures
Authors:
|
Ilya Lashuk
(Georgia Institute of Technology)
|
|
Aparna Chandramowlishwaran
(Georgia Institute of Technology)
|
|
Harper Langston
(Georgia Institute of Technology)
|
|
Tuan-Anh Nguyen
(Georgia Institute of Technology)
|
|
Rahul Sampath
(Georgia Institute of Technology)
|
|
Aashay Shringarpure
(Georgia Institute of Technology)
|
|
Rich Vuduc
(Georgia Institute of Technology)
|
|
Lexing Ying
(University of Texas at Austin)
|
|
Denis Zorin
(New York University)
|
|
George Biros
(Georgia Institute of Technology)
|
Papers Session
|
Particle Methods
|
|
Tuesday, 03:30PM - 04:00PM
|
|
Room PB255
|
Abstract:
We present new scalable algorithms and a new implementation of our
kernel-independent fast multipole method (Ying et al. ACM/IEEE SC
'03), in which we employ both distributed memory parallelism (via MPI)
and shared memory/streaming parallelism (via GPU acceleration) to
rapidly evaluate two-body non-oscillatory potentials. On traditional
CPU-only systems, our implementation scales well up to 30 billion
unknowns on 65K cores (AMD/CRAY-based Kraken system at NSF/NICS) for
highly non-uniform point distributions. On GPU-enabled systems, we
achieve 30X speedup for problems of up to 256 million points on
256 GPUs (Lincoln at NSF/NCSA) over a comparable CPU-only based
implementations.
We use a new MPI-based tree construction and partitioning, and
a new reduction algorithm for the evaluation phase. For the
sub-components of the evaluation phase, we use NVIDIA's CUDA framework to achieve excellent performance. Taken together, these components show promise for ultrascalable FMM in the petascale era and beyond.
|
|
|