Track: ML for Systems

Thu 16 May 15:30 - 15:50 PDT

29

On Latency Predictors for Neural Architecture Search

Yash Akhauri · Mohamed Abdelfattah

Efficient deployment of neural networks (NN) requires the co-optimization of accuracy and latency. For example, hardware-aware neural architecture search has been used to automatically find NN architectures that satisfy a latency constraint on a specific hardware device. Central to these search algorithms is a prediction model that is designed to provide a hardware latency estimate for a candidate NN architecture. Recent research has shown that the sample efficiency of these predictive models can be greatly improved through pre-training on some training devices with many samples, and then transferring the predictor on the test (target) device.Transfer learning and meta-learning methods have been used for this, but often exhibit significant performance variability.Additionally, the evaluation of existing latency predictors has been largely done on hand-crafted training/test device sets, making it difficult to ascertain design features that compose a robust and general latency predictor. To address these issues, we introduce a comprehensive suite of latency prediction tasks obtained in a principled way through automated partitioning of hardware device sets.We then design a general latency predictor to comprehensively study (1) the predictor architecture, (2) NN sample selection methods, (3) hardware device representations, and (4) NN operation encoding schemes.Building on conclusions from our study, we present an end-to-end latency predictor training strategy that outperforms existing methods on 11 out of 12 difficult latency prediction tasks, improving latency prediction by 22.5% on average, and up to to 87.6% on the hardest tasks. Focusing on latency prediction, our HW-Aware NAS reports a 5.8x speedup in wall-clock time.Our code is available at \href{http://www.releaseuponacceptance.com}{http://www.release_upon_acceptance.com}.

Thu 16 May 15:50 - 16:10 PDT

4

FLASH: Fast Model Adaptation in ML-Centric Cloud Platforms

Haoran Qiu · Weichao Mao · Archit Patke · Shengkun Cui · Chen Wang · Hubertus Franke · Zbigniew Kalbarczyk · Tamer Basar · Ravi Iyer

The emergence of ML in various cloud system management tasks (e.g., workload autoscaling and job scheduling) has become a core driver of ML-centric cloud platforms. However, there are still numerous algorithmic and systems challenges that prevent ML-centric cloud platforms from being production-ready. In this paper, we focus on the challenges of model performance variability and costly model retraining, introduced by dynamic workload patterns and heterogeneous applications and infrastructures in cloud environments. To address these challenges, we present FLASH, an extensible framework for fast model adaptation in ML-based system management tasks. We show how FLASH leverages existing ML agents and their training data to learn to generalize across applications/environments with meta-learning. FLASH can be easily integrated with an existing ML-based system management agent with a unified API. We demonstrate the use of FLASH by implementing three existing ML agents that manage (1) resource configurations, (2) autoscaling, and (3) server power. Our experiments show that FLASH enables fast adaptation to new, previously unseen applications/environments (e.g., 5.5x faster than transfer learning in the autoscaling task), indicating significant potential for adopting ML-centric cloud platforms in production.

Thu 16 May 16:10 - 16:30 PDT

16

VQPy: An Object-Oriented Approach to Modern Video Analytics

Shan Yu · Zhenting Zhu · Yu Chen · Hanchen Xu · Pengzhan Zhao · Yang Wang · Arthi Padmanabhan · Hugo Latapie · Harry Xu

Video analytics is widely used in contemporary systems and services. At the forefront of video analytics are video queries that users develop to find objects of particular interest. Building upon the insight that video objects (e.g., human, animals, cars, etc.), the center of video analytics, are similar in spirit to objects modeled by traditional object-oriented languages, we propose to develop an object-oriented approach to video analytics. This approach, named VQPy, consists of a front-end— a Python variant with constructs that make it easy for users to express video objects and their interactions—as well as an extensible backend that can automatically construct and optimize pipelines based on video objects. We have implemented and open-sourced VQPy, which is currently used in a major tech company as part of their DeepVision framework.

Thu 16 May 16:30 - 16:50 PDT

5

UniDM: A Unified Framework for Data Manipulation with Large Language Models

Yichen Qian · Yongyi He · Rong Zhu · Jintao Huang · Zhijian Ma · Haibin Wang · Yaohua Wang · Xiuyu Sun · Defu Lian · Bolin Ding · Jingren Zhou

Designing effective data manipulation methods is a long standing problem in data lakes. Traditional methods, which rely on rules or machine learning models, require extensive human efforts on training data collection and tuning models. Recent methods apply Large Language Models (LLMs) to resolve multiple data manipulation tasks. They exhibit bright benefits in terms of performance but still require customized designs to fit each specific task. This is very costly and can not catch up with the requirements of big data lake platforms. In this paper, inspired by the cross-task generality of LLMs on NLP tasks, we pave the first step to design an automatic and general solution to tackle with data manipulation tasks. We propose UniDM, a unified framework which establishes a new paradigm to process data manipulation tasks using LLMs. UniDM formalizes a number of data manipulation tasks in a unified form and abstracts three main general steps to solve each task. We develop an automatic context retrieval to allow the LLMs to retrieve data from data lakes, potentially containing evidence and factual information. For each step, we design effective prompts to guide LLMs to produce high quality results. By our comprehensive evaluation on a variety of benchmarks, our UniDM exhibits great generality and state-of-the-art performance on a wide variety of data manipulation tasks.

Main Navigation

Session

ML for Systems

On Latency Predictors for Neural Architecture Search

FLASH: Fast Model Adaptation in ML-Centric Cloud Platforms

VQPy: An Object-Oriented Approach to Modern Video Analytics

UniDM: A Unified Framework for Data Manipulation with Large Language Models