Bit-serial Weight Pools: Compression and Arbitrary Precision Execution of Neural Networks on Resource Constrained Processors

Shurui Li · Puneet Gupta

Keywords: [ systems for ml ] [ ml compilers and runtime ]

[ Abstract ]
[ Paper PDF
Oral presentation: ML Compilers & Runtime
Mon 29 Aug 2:15 p.m. PDT — 3:30 p.m. PDT


Applications of neural networks on edge systems have proliferated in recent years but the ever increasing model size makes neural networks not able to deploy on resource-constrained microcontrollers efficiently. We propose bit-serial weight pools, an end-to-end framework that includes network compression and acceleration of arbitrary sub-byte precision. The framework can achieve up to 8x compression compared to 8-bit networks by sharing a pool of weights across the entire network. We further propose a bit-serial lookup based software implementation that allows runtime-bitwidth tradeoff and is able to achieve more than 2.8x speedup and 7.5x storage compression compared to 8-bit networks, with less than 1% accuracy drop.

Chat is not available.