HomeSC is the International Conference for
 High Performnance Computing, Networking, Storage and Analysis
scyourway

SC Conference - Activity Details



Auto-Tuning 3-D FFT Library for CUDA GPUs

Authors:
Akira Nukada  (Tokyo Institute of Technology)
Satoshi Matsuoka  (Tokyo Institute of Technology)
Papers Session
GPU/SIMD Processing
Tuesday,  11:30AM - 12:00PM
Room PB252
Abstract:
Existing implementations of FFTs on GPUs are optimized for specific transform sizes like powers of two, and exhibit unstable and peaky performance i.e., do not perform as well in other sizes that appear in practice. Our new auto-tuning 3-D FFT on CUDA generates high performance CUDA kernels for FFTs of varying transform sizes, alleviating this problem. Although auto-tuning has been implemented on GPUs for dense kernels such as DGEMM and stencils, this is the first instance that has been applied comprehensively to bandwidth intensive and complex kernels such as 3-D FFTs. Bandwidth intensive optimizations such as selecting the number of threads and inserting padding to avoid bank conflicts on shared memory are systematically applied. Our resulting auto-tuner is fast and results in performance that essentially beats all 3-D FFT implementations on a single processor to date, and moreover exhibits stable performance irrespective of problem sizes or the underlying GPU hardware.
The full paper can be found in the ACM Digital Library and IEEE Computer Society
   Sponsors    ACM    IEEE