| At the ConferenceExhibitsTransportationLodgingDiningNightlife | ||||
![]() |
||||
SC Conference - Activity DetailsArt of Performance Tuning for CUDA and Manycore Architectures Primary Session Leader:
Kevin Skadron
(University of Virginia)
Secondary Session Leaders:
High throughput architectures for HPC seem likely to emphasize many cores with deep multithreading, wide SIMD, and sophisticated memory hierarchies. GPUs present one example, and their high throughput has led a number of researchers to port computationally intensive applications to NVIDIA's CUDA architecture.
This session will explore the art of performance tuning for CUDA. Topics will include profiling to identify bottlenecks, effective use of the GPU's memory hierarchy and DRAM interface to maximize bandwidth, data versus task parallelism, avoiding branch divergence, and effective use of native hardware functionality such as transcendentals and synchronization primitives to optimize CPU utilization. Many of the lessons learned in the context of CUDA are likely to apply to other manycore architectures used in HPC applications.
About half the time will be spent in an organized presentation by experienced CUDA programmers, and the other half in open discussion.
|
||||
|