Moderator: Mohammad Alizadeh
Yunfeng Lin · Guilin Li · Xing Zhang · Weinan Zhang · Bo Chen · Ruiming Tang · Zhenguo Li · Jiashi Feng · Yong Yu
Automated neural architecture search (NAS) methods have been demonstrated as a powerful tool to facilitate neural architecture design. However, the broad applicability of NAS has been restrained due to the difficulty of designing task-specific search spaces and the necessity and verbosity to implement every NAS component from scratch when switching to another search space. In this work, we propose ModularNAS, a framework that implements essential components of NAS in a modularized and unified manner. It enables automatic search space generation for customized use cases while reusing predefined search strategies, with little extra work needed for each case. We conduct extensive experiments to verify the improved model performance on various tasks by reusing supported NAS components over customized search spaces. We have also shown that targeting existing architectures, ModularNAS can find superior ones concerning accuracy and deployment efficiency, such as latency and FLOPS. The source code of our framework can be found at https://github.com/huawei-noah/vega/tree/master/vega/algorithms/nas/modnas.
Peifeng Yu · Jiachen Liu · Mosharaf Chowdhury
Current hyperparameter tuning solutions lack complementary execution engines to efficiently leverage distributed computation, thus ignoring the possibility of intra- and inter-GPU sharing, which exhibits poor resource usage. In this paper, we present Fluid, a generalized hyperparameter tuning execution engine, that coordinates between hyperparameter tuning jobs and cluster resources. Fluid schedules evaluation trials in such jobs using a water-filling approach to make the best use of resources both at intra- and inter-GPU granularities to speed up the tuning process. By abstracting a hyperparameter tuning job as a sequence of TrialGroup, Fluid can boost the performance of diverse hyperparameter tuning solutions. Our experiments show that Fluid can speed up synchronous BOHB by 200%, and BOHB and ASHA by 30% while having similar final accuracy.
Colby Banbury · Chuteng Zhou · Igor Fedorov · Ramon Matas · Urmish Thakker · Dibakar Gope · Vijay Janapa Reddi · Matthew Mattina · Paul Whatmough
Executing machine learning workloads locally on resource constrained microcontrollers (MCUs) promises to drastically expand the application space of IoT. However, so-called TinyML presents severe technical challenges, as deep neural network inference demands a large compute and memory budget. To address this challenge, neural architecture search (NAS) promises to help design accurate ML models that meet the tight MCU memory, latency, and energy constraints. A key component of NAS algorithms is their latency/energy model, i.e., the mapping from a given neural network architecture to its inference latency/energy on an MCU. In this paper, we observe an intriguing property of NAS search spaces for MCU model design: on average, model latency varies linearly with model operation (op) count under a uniform prior over models in the search space. Exploiting this insight, we employ differentiable NAS (DNAS) to search for models with low memory usage and low op count, where op count is treated as a viable proxy to latency. Experimental results validate our methodology, yielding our MicroNet models, which we deploy on MCUs using Tensorflow Lite Micro, a standard open-source neural network (NN) inference runtime widely used in the TinyML community. MicroNets demonstrate state-of-the-art results for all three TinyMLperf industry-standard benchmark tasks: visual wake words, audio keyword spotting, and anomaly detection. Models and training scripts can be found at https://github.com/ARM-software/ML-zoo.
Eyal Cidon · Evgenya Pergament · Zain Asgar · Asaf Cidon · Sachin Katti
The same machine learning model running on different edge devices may produce highly-divergent outputs on a nearly-identical input. Possible reasons for the divergence include differences in the device sensors, the device's signal processing hardware and software, and its operating system and processors. This paper presents the first methodical characterization of the variations in model prediction across real-world mobile devices. We demonstrate that accuracy is not a useful metric to characterize prediction divergence, and introduce a new metric, instability, which captures this variation. We characterize different sources for instability, and show that differences in compression formats and image signal processing account for significant instability in object classification models. Notably, in our experiments, 14-17% of images produced divergent classifications across one or more phone models. We evaluate three different techniques for reducing instability. In particular, we adapt prior work on making models robust to noise in order to fine-tune models to be robust to variations across edge devices. We demonstrate our fine-tuning techniques reduce instability by 75%.