Session
in
Tutorial: Sparsity in ML: Understanding and Optimizing Sparsity in Neural Networks Running on Heterogeneous Systems
2:4 Sparsity on GPU Tensor Cores
Rakesh Nagi · Jeff Pool
Recent NVIDIA GPUs have introduced support for 2:4 sparsity in their Tensor Cores to better support sparsified deep neural network models. In this session, we will first present what a 2:4 sparsity pattern is and why it is a good idea for both performance and accuracy (regularity vs. irregular/unstructured, fine-grained vs. coarse-grained). We will then explain how speedup is achieved in hardware along with some performance numbers, followed by details on the associated training process and some accuracy numbers. We will discuss new techniques that search for permutations of model parameters to improve the efficiency of the hardware execution. This session will end with practical ways and best practices to tap into 2:4 sparsity in deep learning frameworks.
See Jeff's bio here.
See Rakesh's bio here.