MLSys Tutorial Sparsity in ML: Understanding and Optimizing Sparsity in Neural Networks Running on Heterogeneous Systems

Tutorial

Sparsity in ML: Understanding and Optimizing Sparsity in Neural Networks Running on Heterogeneous Systems

Mert Hidayetoglu · Jinjun Xiong · Wen-Mei Hwu · Rakesh Nagi · Vikram Sharma Mailthody · Jeff Pool · Sitao Huang

[ Abstract ]

[ Slides]

2022 Tutorial

Abstract:

A plethora of ML models are either sparsified (such as deep neural networks) to save memory footprint and FLOPs or inherently sparse due to their unstructured nature (such as graph neural networks). Nevertheless, even though sparsity is desired in theory, it often hampers the performance in practice because existing heterogeneous systems (such as GPUs and FPGAs) fall short in irregular computations. For example, as the GPU architectures are optimized for regular, dense computations, only a tiny portion of the theoretical GPU performance is realized when performing sparse computation. In this tutorial, we discuss the source of sparsity in deep neural networks as well as key techniques for mapping the sparse computation on heterogeneous systems to support high-performance inference and training. We will conclude this tutorial with a discussion on future work on model parallelism for optimizing sparse communications for large-scale sparse ML models.

Chat is not available.

Schedule

Tue 1:00 p.m. - 1:50 p.m.	Opening remarks and overview of sparsity in ML ( Introduction ) >	Wen-Mei Hwu · Jinjun Xiong 🔗
Tue 1:50 p.m. - 2:45 p.m.	Tiled SpMM and its performance model on GPUs ( Session ) >	Mert Hidayetoglu 🔗
Tue 2:45 p.m. - 3:45 p.m.	Sparse deep neural network inference on FPGAs ( Session ) >	Sitao Huang 🔗
Tue 3:45 p.m. - 4:00 p.m.	Coffee break	🔗
Tue 4:00 p.m. - 4:45 p.m.	2:4 Sparsity on GPU Tensor Cores ( Session ) >	Rakesh Nagi · Jeff Pool 🔗
Tue 4:45 p.m. - 5:00 p.m.	Future work and closing remarks ( Conclusion ) >	Vikram Sharma Mailthody 🔗