Ubiquitous on-device artificial intelligence (AI) is the next step in transforming the myriad of
mobile computing devices in our everyday lives into a new class of truly “smart” devices capable
of constantly observing, learning, and adapting to their environment. Through advances in AI
technology, these intelligent devices will provide proactive assistance and enable new
applications, as well as making our lives safer and the world around us more energy efficient.
Present-day AI features, such as voice-based user interfaces on smartphones, often rely on a
connection to the cloud. In contrast, on-device AI promises to increase the energy efficiency,
privacy, responsiveness, and autonomy of embedded and edge devices by severing their tether
to the cloud. The 3rd on-device intelligence workshop aims to advance the state-of-the-art by
bringing together researchers and practitioners to discuss the key problems, disseminate new
research results, and provide practical tutorial material. Due to the multidisciplinary nature of
on-device AI, collaboration across the traditional computing stack is crucial.
We aim to bring together experts to discuss solutions to the following key challenges:
(1) How do we design, train and optimize ML models tailored to fit a plethora of edge
devices with constrained compute, storage and energy budgets?
(2) How can we ensure privacy and security in ways that are interpretable to users?
(3) How should mobile computing hardware evolve to support the increasing prevalence of
on-device AI workloads?
(4) How can industry and academia collaboratively develop standards and benchmarks to
stimulate the development of an on-device AI research ecosystem?
The workshop agenda includes keynote talks, invited talks, paper talks, panel discussion, and a
poster session. The keynote and invited presenters include prominent leaders in a range of
edge AI subfields. Community building and networking will be interspersed throughout to
facilitate future collaboration across experts in algorithms, software, and hardware engineering
domains.
Thu 6:00 a.m. - 6:15 a.m.
|
Opening Remarks
|
Colby Banbury 🔗 |
Thu 6:15 a.m. - 7:15 a.m.
|
Keynote
Nickolas Lane is an Associate Professor in the department of Computer Science and Technology at the University of Cambridge, where he lead the Machine Learning Systems lab. Their mission is to invent the next-generation of breakthrough ML-centric systems. He is also the Laboratory Director at Samsung AI in Cambridge. This 50-person lab has an agenda of broad ML advancement, and in addition to leading the lab — He personally directs teams focused on distributed and on-device forms of learning. |
🔗 |
Thu 7:30 a.m. - 8:00 a.m.
|
Invited Talk
Tatiana is research scientist at Apple MLR working on semi-supervised and unsupervised learning, speech recognition, and federated learning. |
🔗 |
Thu 8:00 a.m. - 8:30 a.m.
|
Lightning Talks
6 short papers. |
🔗 |
Thu 8:30 a.m. - 9:15 a.m.
|
Poster Session
|
🔗 |
Thu 11:15 a.m. - 11:45 a.m.
|
Invited Talk
Nat Jeffries is a founding engineer at Useful Sensors, where he designs privacy-preserving embedded ML sensors. He graduated from Carnegie Mellon University in 2016 with a degree in ECE. He joined Google where he worked on embedded systems before joining Pete Warden to spin up Tensorflow Lite for Microcontrollers. He has previously spoken at Tensorflow World in Sao Paulo Brazil, and guest lectured on TinyML at Harvard. |
🔗 |
Thu 11:45 a.m. - 12:15 p.m.
|
Invited Talk
Born from the high energy physics community at the Large Hadron Collider, hls4ml is an open-source Python package for machine learning inference in FPGAs (Field Programmable Gate Arrays). It creates firmware implementations of machine learning algorithms by translating traditional, open-source machine learning package models into optimized high level synthesis C++ that can then be customized for your use case and implemented on devices such as FPGAs and Application Specific Integrated Circuits (ASICs). Hls4ml can easily scale the implementation of a model to take advantage of the parallel processing capabilities that FPGAs offer, not only allowing for low latency, high throughput designs, but also designs sized to fit on lower cost, resource constrained hardware. Hls4ml also supports generating accelerators with different drivers that build minimal, self-contained implementations which enable control via Python or C/C++ with little extra development or hardware expertise. |
🔗 |
Thu 12:15 p.m. - 12:45 p.m.
|
Invited Talk
System-Algorithm Co-Design for TinyML There are billions of tiny IoT devices and microcontrollers worldwide. Deploying deep learning models on these tiny devices is appealing but challenging due to the limited memory size (e.g., 256KB, 2-3 orders of magnitude smaller than mobile phones). In this talk, we will discuss our recent efforts that employ system-algorithm co-design to enable tinyML inference and training. We first propose MCUNet, a framework that jointly designs the efficient neural architecture (TinyNAS) and the lightweight inference engine (TinyEngine), enabling ImageNet-scale inference on microcontrollers. MCUNet is the first framework to achieve the milestone of 70% ImageNet top-1 on commercial microcontrollers. We then look into the SRAM bottleneck of CNN model inference and found that the first several blocks have a significantly higher memory usage. We propose MCUNetV2, featuring a generic patch-based inference schedule that operates only on a small spatial region of the feature map and significantly cuts down the peak memory, enabling more vision applications like object detection for tinyML. Finally, we extend the framework to support on-device training. We propose a sparse update scheme to selectively update only the important weights for transfer learning and cut down the training cost. The algorithmic innovation is implemented by Tiny Training Engine (TTE), which prunes the backward computation graph and offloads the workload from runtime to compile time. Our framework is the first practical solution for on-device transfer learning of visual recognition under 256KB SRAM, 1000x smaller than existing frameworks. We hope our work can inspire more tinyML applications on edge. |
🔗 |
Thu 1:00 p.m. - 1:30 p.m.
|
Invited Talk
Ankita is a Senior Staff Engineer in Wireless R&D at Qualcomm Technologies, Inc. Ankita has over eleven years of research and product development experience in hardware and software systems for deep learning, computer vision and wireless domains. She works on on-device ML initiatives for AI/ML-enabled 5G modems. She is also pursuing a doctoral degree at Stanford University and her research is on energy-efficient agile hardware systems for deep learning and computer vision. Ankita has served as a reviewer, technical program committee member and published at various systems and architecture conferences. |
🔗 |
Thu 1:30 p.m. - 2:00 p.m.
|
Panel Discussion
|
🔗 |
Thu 2:00 p.m. - 2:15 p.m.
|
Closing Remarks
|
Vijay Janapa Reddi 🔗 |