MLSys 2022 AccMPEG: Optimizing Video Encoding for Accurate Video Analytics Oral

Oral

AccMPEG: Optimizing Video Encoding for Accurate Video Analytics

Kuntai Du · Kuntai Du · Qizheng Zhang · Qizheng Zhang · Anton Arapin · Anton Arapin · Haodong Wang · Haodong Wang · Zhengxu Xia · Zhengxu Xia · Junchen Jiang · Junchen Jiang

[ Abstract ] [ Visit Efficient Inference and Model Serving ]

[ Paper PDF]

Abstract:

With more videos being recorded by edge sensors (cameras) and analyzed by computer-vision deep neural nets (DNNs), a new breed of video streaming systems has emerged, with the goal to compress and stream videos to remote servers in real time while preserving enough information to allow highly accurate inference by the server-side DNNs. An ideal design of the video streaming system should simultaneously meet three key requirements: (1) low latency of encoding and streaming, (2) high accuracy of server-side DNNs, and (3) low compute overheads on the camera. Unfortunately, despite many recent efforts, such video streaming system has hitherto been elusive, especially when serving advanced vision tasks such as object detection or semantic segmentation.This paper presents AccMPEG, a new video encoding and streaming system that meets the three objectives. The key is to learn how much the encoding quality at each (16x16) macroblock can influence the server-side DNN accuracy, which we call accuracy gradients. Our insight is that these macroblock-level accuracy gradients can be inferred with sufficient precision by feeding the video frames through a cheap model. AccMPEG provides a suite of techniques that, given a new server-side DNN, can quickly create a cheap model to infer the accuracy gradients on any new frame in near realtime. Our extensive evaluation of AccMPEG on two types of edge devices (one Intel Xeon Silver 4100 CPU or NVIDIA Jetson Nano) and three vision tasks (six recent pre-trained DNNs) shows that compared to the state-of-the-art baselines, AccMPEG (with the same camera-side compute resources) can reduce the end-to-end inference delay by 10-43% without hurting accuracy.

Chat is not available.