Timezone: »
Lightweight architectural designs of Convolutional Neural Networks (CNNs) together with quantization have paved the way for the deployment of demanding computer vision applications on mobile devices. Parallel to this, alternative formulations to the convolution operation such as FFT, Strassen and Winograd, have been adapted for use in CNNs offering further speedups. Winograd convolutions are the fastest known algorithm for spatially small convolutions, but exploiting their full potential comes with the burden of numerical error, rendering them unusable in quantized contexts. In this work we propose aWinograd-aware formulation of convolution layers which exposes the numerical inaccuracies introduced by the Winograd transformations to the learning of the model parameters, enabling the design of competitive quantized models without impacting model size. We also address the source of the numerical error and propose a relaxation on the form of the transformation matrices, resulting in up to 5% higher classification accuracy on CIFAR-10. Finally, we propose wiNAS, a neural architecture search (NAS) framework that jointly optimizes a given macro-architecture for accuracy and latency leveraging Winograd-aware layers. A Winograd-aware ResNet-18 optimized with wiNAS for CIFAR-10 results in 2.9x speedup compared to im2row, one of the most widely used optimized convolution implementations, with no loss in accuracy.
Author Information
Javier Fernandez-Marques (University of Oxford)
Paul Whatmough (Arm ML Research Lab)
Andrew Mundy (Arm ML Research Lab)
Matthew Mattina (Arm ML Research Lab)
Related Events (a corresponding poster, oral, or spotlight)
-
2020 Poster: Searching for Winograd-aware Quantized Networks »
Tue. Mar 3rd 12:30 -- 03:00 AM Room Ballroom A
More from the Same Authors
-
2021 Workshop: 2nd On-Device Intelligence Workshop »
Paul Whatmough · Vijay Janapa Reddi · Chuteng Zhou · Igor Federov · Matthew Mattina · Pete Warden · Ganesh Venkatesh · Vikas Chandra -
2021 Poster: Doping: A technique for Extreme Compression of LSTM Models using Sparse Structured Additive Matrices »
Urmish Thakker · Paul Whatmough · ZHIGANG LIU · Matthew Mattina · Jesse Beu -
2021 Oral: Doping: A technique for Extreme Compression of LSTM Models using Sparse Structured Additive Matrices »
Urmish Thakker · Paul Whatmough · ZHIGANG LIU · Matthew Mattina · Jesse Beu -
2021 Poster: MicroNets: Neural Network Architectures for Deploying TinyML Applications on Commodity Microcontrollers »
Colby Banbury · Chuteng Zhou · Igor Fedorov · Ramon Matas · Urmish Thakker · Dibakar Gope · Vijay Janapa Reddi · Matthew Mattina · Paul Whatmough -
2021 Oral: MicroNets: Neural Network Architectures for Deploying TinyML Applications on Commodity Microcontrollers »
Colby Banbury · Chuteng Zhou · Igor Fedorov · Ramon Matas · Urmish Thakker · Dibakar Gope · Vijay Janapa Reddi · Matthew Mattina · Paul Whatmough