Tutorial: Sparsity in ML: Understanding and Optimizing Sparsity in Neural Networks Running on Heterogeneous Systems
Sparse deep neural network inference on FPGAs
This session presents the design and implementation of a highly flexible sparse DNN inference accelerator on FPGAs using high-level synthesis (HLS). We will explain how the custom sparse computation hardware synthesized from C/C++ and Python can achieve higher energy efficiency than CPUs and GPUs. Our proposed inference engine can be easily configured to be used in both mobile/edge computing and high-performance computing scenarios. Evaluation shows our proposed inference engine effectively accelerates sparse DNNs and outperforms CPU solution by up to 4.7 times in terms of energy efficiency. We will conclude with a survey of sparse support in related FPGA and ASIC accelerators.
See Sitao's bio here.