Timezone: »

A System for Massively Parallel Hyperparameter Tuning
Liam Li · Kevin Jamieson · Afshin Rostamizadeh · Ekaterina Gonina · Jonathan Ben-tzur · Moritz Hardt · Benjamin Recht · Ameet Talwalkar

Mon Mar 02 04:30 PM -- 07:00 PM (PST) @ Ballroom A #1

Modern learning models are characterized by large hyperparameter spaces and long training times. These properties, coupled with the rise of parallel computing and the growing demand to productionize machine learning workloads, motivate the need to develop mature hyperparameter optimization functionality in distributed computing settings. We address this challenge by first introducing a simple and robust hyperparameter optimization algorithm called ASHA, which exploits parallelism and aggressive early-stopping to tackle large-scale hyperparameter optimization problems. Our extensive empirical results show that ASHA outperforms existing state-of-the-art hyperparameter optimization methods; scales linearly with the number of workers in distributed settings; and is suitable for massive parallelism, converging to a high quality configuration in half the time taken by Vizier (Google’s internal hyperparameter optimization service) in an experiment with 500 workers. We then describe several design decisions we encountered, along with our associated solutions, when integrating ASHA in SystemX, an end-to-end production-quality machine learning system that offers hyperparameter tuning as a service.

Author Information

Liam Li (Carnegie Mellon University)
Kevin Jamieson (U Washington)
Afshin Rostamizadeh (Google Research)
Ekaterina Gonina (Google)
Jonathan Ben-tzur (Determined AI)
Moritz Hardt (UC Berkeley)
Benjamin Recht (UC Berkeley)
Ameet Talwalkar (CMU)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors