Timezone: »

Trained Quantization Thresholds for Accurate and Efficient Fixed-Point Inference of Deep Neural Networks
Sambhav R. Jain · Albert Gural · Michael Wu · Chris Dick

Mon Mar 04:30 PM -- 07:00 PM PST @ Ballroom A #28

We propose a method of training quantization thresholds (TQT) for uniform symmetric quantizers using standard backpropagation and gradient descent. Contrary to prior work, we show that a careful analysis of the straight-through estimator for threshold gradients allows for a natural range-precision trade-off leading to better optima. Our quantizers are constrained to use power-of-2 scale-factors and per-tensor scaling of weights and activations to make it amenable for hardware implementations. We present analytical support for the general robustness of our methods and empirically validate them on various CNNs for ImageNet classification. We are able to achieve near-floating-point accuracy on traditionally difficult networks such as MobileNets with less than 5 epochs of quantized (8-bit) retraining. Finally, we present Graffitist, a framework that enables automatic quantization of TensorFlow graphs for TQT.

Author Information

Sambhav R. Jain (Xilinx / Stanford)

Sambhav Jain works on machine learning at Xilinx with a focus on efficient training and inference of deep neural nets, including fixed-point modeling, low-precision training, quantization, compression and graph optimizations. He graduated from Stanford University with a Masters in Electrical Engineering, and was previously with Oracle and Texas Instruments.

Albert Gural (Stanford University)
Michael Wu (Xilinx, Inc.)
Chris Dick (Xilinx, Inc.)

Related Events (a corresponding poster, oral, or spotlight)