Tutorial

Sparsity in Neural Networks: Optimization of Sparse Data Accesses and Communications on Heterogeneous Architectures

Mert Hidayetoglu

Room 204
[ Abstract ]
Tue 30 Aug 1 p.m. PDT — 5 p.m. PDT

Abstract:

A plethora of ML models are either sparsified (such as deep neural networks) to save memory footprint and FLOPs or are inherently sparse due to their unstructured nature (such as graph neural networks). Nevertheless, even though sparsity is desired in theory, it hampers the performance in practice because GPUs fall short to perform irregular computations with their architecture optimized for regular computations. As a result, only a tiny portion of the theoretical GPU performance is utilized. In this tutorial, we discuss various new techniques from our research to improve the performance of sparse computations with use cases of sparse deep neural networks, graph analytics & mining, and graph deep neural networks. The discussion follows with a hands-on tutorial session that is in part based on the repositories that we have developed for the MIT/Amazon/IEEE Graph Challenge effort. We will also present our works on FPGA-based hardware accelerators for sparse DNN inference and graph algorithms. These FPGA accelerators feature rapid design flows that are based on high-level synthesis and flexible hardware design with tunable parameters. Our proposed FPGA acceleration engine can be easily configured to be used in both mobile computing and high-performance computing scenarios. The evaluation shows our proposed acceleration engines significantly improved energy efficiency compared to the other platforms.

Chat is not available.