Skip to yearly menu bar Skip to main content


Contributed 4
in
Workshop: 2nd On-Device Intelligence Workshop

A Compiler-aware Framework of Network Pruning Search Achieving Beyond Real-Time Mobile Acceleration (Yanyu Li, Northeastern University)


Abstract:

With the increasing demand to efficiently deploy DNNs on mobile edge devices, it becomes much more important to reduce unnecessary computation and increase the execution speed. Prior methods towards this goal, including model compression and network architecture search (NAS), are largely performed independently and do not fully consider compiler-level optimization which is a must-do for mobile acceleration. In this work, we propose NPS, a compiler-aware unified network pruning search, and the corresponding comprehensive compiler optimizations supporting different DNNs and different pruning schemes, which bridge the gap of weight pruning and NAS. Our framework achieves 6.7ms, 5.9ms, and 3.9ms ImageNet inference times with 77%, 75% (MobileNet-V3 level), and 71% (MobileNet-V2 level) Top-1 accuracy respectively on an off-the-shelf mobile phone, consistently outperforming prior work.