Skip to yearly menu bar Skip to main content

Invited Talk
Workshop: The 3rd On-Device Intelligence Workshop

Invited Talk


System-Algorithm Co-Design for TinyML There are billions of tiny IoT devices and microcontrollers worldwide. Deploying deep learning models on these tiny devices is appealing but challenging due to the limited memory size (e.g., 256KB, 2-3 orders of magnitude smaller than mobile phones). In this talk, we will discuss our recent efforts that employ system-algorithm co-design to enable tinyML inference and training. We first propose MCUNet, a framework that jointly designs the efficient neural architecture (TinyNAS) and the lightweight inference engine (TinyEngine), enabling ImageNet-scale inference on microcontrollers. MCUNet is the first framework to achieve the milestone of 70% ImageNet top-1 on commercial microcontrollers. We then look into the SRAM bottleneck of CNN model inference and found that the first several blocks have a significantly higher memory usage. We propose MCUNetV2, featuring a generic patch-based inference schedule that operates only on a small spatial region of the feature map and significantly cuts down the peak memory, enabling more vision applications like object detection for tinyML. Finally, we extend the framework to support on-device training. We propose a sparse update scheme to selectively update only the important weights for transfer learning and cut down the training cost. The algorithmic innovation is implemented by Tiny Training Engine (TTE), which prunes the backward computation graph and offloads the workload from runtime to compile time. Our framework is the first practical solution for on-device transfer learning of visual recognition under 256KB SRAM, 1000x smaller than existing frameworks. We hope our work can inspire more tinyML applications on edge.

Chat is not available.