Timezone: »
Achieving high performance for compute-intensive operators in machine learning (ML) workloads is a crucial but challenging task. Many ML and system practitioners rely on vendor libraries or auto-schedulers to do the job. While the former requires large engineering efforts, the latter only supports static-shape workloads in existing works. It is difficult, if not impractical, to apply existing auto-schedulers directly to dynamic-shape workloads, as this leads to extremely long auto-scheduling time.We observe that the key challenge faced by existing auto-schedulers when handling a dynamic-shape workload is that they cannot construct a unified search space for all the possible shapes of the workload, because their search space is shape-dependent. To address this, we propose DietCode, a new auto-scheduler framework that efficiently supports dynamic-shape workloads by constructing a shape-generic search space and cost model. Under this construction, all shapes jointly search within the same space and update the same cost model when auto-scheduling, which is therefore more efficient compared with existing auto-schedulers.We evaluate DietCode using state-of-the-art machine learning workloads on a modern GPU. Our evaluation shows that DietCode has the following key strengths when auto-scheduling an entire model end-to-end: (1) reduces the auto-scheduling time by up to 5.88x less than the state-of-the-art auto-scheduler on the uniformly sampled dynamic shapes (94.1x estimated if all possible shapes are included), (2) improves performance by up to 69.5% better than the auto-scheduler and 18.6% better than the vendor library. All these advantages make DietCode an efficient and practical solution for dynamic-shape workloads.
Author Information
Bojian Zheng (University of Toronto)
Ziheng Jiang (University of Washington and OctoML)
Cody Hao Yu (Amazon Web Services)
Haichen Shen (Amazon)
Joshua Fromm (OctoML, University of Washington)
Yizhi Liu (Amazon)
Yida Wang (Amazon)
Luis Ceze (University of Washington and OctoML)
Tianqi Chen (CMU)
Gennady Pekhimenko (University of Toronto)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Oral: DietCode: Automatic Optimization for Dynamic Tensor Programs »
Mon. Aug 29th 09:51 -- 10:09 PM Room Exhibit Hall A
More from the Same Authors
-
2022 Poster: The CoRa Tensor Compiler: Compilation for Ragged Tensors with Minimal Padding »
Pratik Fegade · Tianqi Chen · Phillip Gibbons · Todd Mowry -
2022 Poster: SRIFTY: Swift and Thrifty Distributed Neural Network Training on the Cloud »
Liang Luo · Peter West · Pratyush Patel · Arvind Krishnamurthy · Luis Ceze -
2023 Poster: Hotline Profiler: Automatic Annotation and A Multi-Scale Timeline for Visualizing Time-Use in DNN Training »
Daniel Snider · Fanny Chevalier · Gennady Pekhimenko -
2022 Symposium: Chips & Compilers »
Yida Wang · Gennady Pekhimenko -
2022 Oral: SRIFTY: Swift and Thrifty Distributed Neural Network Training on the Cloud »
Liang Luo · Liang Luo · Peter West · Peter West · Pratyush Patel · Pratyush Patel · Arvind Krishnamurthy · Luis Ceze · Luis Ceze -
2022 Oral: The CoRa Tensor Compiler: Compilation for Ragged Tensors with Minimal Padding »
Pratik Fegade · Tianqi Chen · Phillip Gibbons · Todd Mowry -
2021 : Industry/Academia Panel »
Zachary C Lipton · Udit Gupta · Lillian Pentecost · Shagun Sodhani · Abhishek Gupta · Mayoore Jaiswal · Michael Carbin · Devi Parikh · Gennady Pekhimenko -
2021 : "Machine Learning Tools: Skyline and RL-Scope" - Gennady Pekhimenko and James Gleeson (University of Toronto) »
Gennady Pekhimenko -
2021 : Thoughts on Research, Community and Impact »
Luis Ceze -
2021 Poster: Horizontally Fused Training Array: An Effective Hardware Utilization Squeezer for Training Novel Deep Learning Models »
Shang Wang · Peiming Yang · Yuxuan Zheng · Xin Li · Gennady Pekhimenko -
2021 Poster: Boveda: Building an On-Chip Deep Learning Memory Hierarchy Brick by Brick »
Isak Edo Vivancos · Sayeh Sharify · Daniel Ly-Ma · Ameer Abdelhadi · Ciaran Bannon · Milos Nikolic · Mostafa Mahmoud · Alberto Delmas Lascorz · Gennady Pekhimenko · Andreas Moshovos -
2021 Oral: Horizontally Fused Training Array: An Effective Hardware Utilization Squeezer for Training Novel Deep Learning Models »
Shang Wang · Peiming Yang · Yuxuan Zheng · Xin Li · Gennady Pekhimenko -
2021 Oral: Boveda: Building an On-Chip Deep Learning Memory Hierarchy Brick by Brick »
Isak Edo Vivancos · Sayeh Sharify · Daniel Ly-Ma · Ameer Abdelhadi · Ciaran Bannon · Milos Nikolic · Mostafa Mahmoud · Alberto Delmas Lascorz · Gennady Pekhimenko · Andreas Moshovos -
2021 Poster: Nimble: Efficiently Compiling Dynamic Neural Networks for Model Inference »
Haichen Shen · Jared Roesch · Zhi Chen · Wei Chen · Yong Wu · Mu Li · Vin Sharma · Zachary Tatlock · Yida Wang -
2021 Poster: RL-Scope: Cross-stack Profiling for Deep Reinforcement Learning Workloads »
James Gleeson · Srivatsan Krishnan · Moshe Gabel · Vijay Janapa Reddi · Eyal de Lara · Gennady Pekhimenko -
2021 Poster: IOS: Inter-Operator Scheduler for CNN Acceleration »
Yaoyao Ding · Ligeng Zhu · Zhihao Jia · Gennady Pekhimenko · Song Han -
2021 Oral: Nimble: Efficiently Compiling Dynamic Neural Networks for Model Inference »
Haichen Shen · Jared Roesch · Zhi Chen · Wei Chen · Yong Wu · Mu Li · Vin Sharma · Zachary Tatlock · Yida Wang -
2021 Oral: IOS: Inter-Operator Scheduler for CNN Acceleration »
Yaoyao Ding · Ligeng Zhu · Zhihao Jia · Gennady Pekhimenko · Song Han -
2021 Oral: RL-Scope: Cross-stack Profiling for Deep Reinforcement Learning Workloads »
James Gleeson · Srivatsan Krishnan · Moshe Gabel · Vijay Janapa Reddi · Eyal de Lara · Gennady Pekhimenko -
2021 Poster: Cortex: A Compiler for Recursive Deep Learning Models »
Pratik Fegade · Tianqi Chen · Phillip Gibbons · Todd Mowry -
2021 Oral: Cortex: A Compiler for Recursive Deep Learning Models »
Pratik Fegade · Tianqi Chen · Phillip Gibbons · Todd Mowry -
2021 : Panel Discussion »
Luis Ceze · Cliff Young · Chris Lattner -
2021 : Q&A for Tianqi Chen »
Tianqi Chen -
2021 : TVM »
Tianqi Chen -
2021 Symposium: Chips and Compilers Symposium »
Mu Li · Tianqi Chen -
2020 Oral: MLPerf Training Benchmark »
Peter Mattson · Christine Cheng · Gregory Diamos · Cody Coleman · Paulius Micikevicius · David Patterson · Hanlin Tang · Gu-Yeon Wei · Peter Bailis · Victor Bittorf · David Brooks · Dehao Chen · Debo Dutta · Udit Gupta · Kim Hazelwood · Andy Hock · Xinyuan Huang · Daniel Kang · David Kanter · Naveen Kumar · Jeffery Liao · Deepak Narayanan · Tayo Oguntebi · Gennady Pekhimenko · Lillian Pentecost · Vijay Janapa Reddi · Taylor Robie · Tom St John · Carole-Jean Wu · Lingjie Xu · Cliff Young · Matei Zaharia -
2020 Oral: Riptide: Fast End-to-End Binarized Neural Networks »
Joshua Fromm · Meghan Cowan · Matthai Philipose · Luis Ceze · Shwetak Patel -
2020 Poster: PLink: Discovering and Exploiting Locality for Accelerated Distributed Training on the public Cloud »
Liang Luo · Peter West · Jacob Nelson · Arvind Krishnamurthy · Luis Ceze -
2020 Poster: MLPerf Training Benchmark »
Peter Mattson · Christine Cheng · Gregory Diamos · Cody Coleman · Paulius Micikevicius · David Patterson · Hanlin Tang · Gu-Yeon Wei · Peter Bailis · Victor Bittorf · David Brooks · Dehao Chen · Debo Dutta · Udit Gupta · Kim Hazelwood · Andy Hock · Xinyuan Huang · Daniel Kang · David Kanter · Naveen Kumar · Jeffery Liao · Deepak Narayanan · Tayo Oguntebi · Gennady Pekhimenko · Lillian Pentecost · Vijay Janapa Reddi · Taylor Robie · Tom St John · Carole-Jean Wu · Lingjie Xu · Cliff Young · Matei Zaharia -
2020 Poster: Riptide: Fast End-to-End Binarized Neural Networks »
Joshua Fromm · Meghan Cowan · Matthai Philipose · Luis Ceze · Shwetak Patel -
2020 Poster: BPPSA: Scaling Back-propagation by Parallel Scan Algorithm »
Shang Wang · Yifan Bai · Gennady Pekhimenko -
2020 Demonstration: Skyline: Interactive In-editor Performance Visualizations and Debugging for DNN Training »
Geoffrey Yu · Tovi Grossman · Gennady Pekhimenko -
2020 Oral: BPPSA: Scaling Back-propagation by Parallel Scan Algorithm »
Shang Wang · Yifan Bai · Gennady Pekhimenko -
2020 Oral: PLink: Discovering and Exploiting Locality for Accelerated Distributed Training on the public Cloud »
Liang Luo · Peter West · Jacob Nelson · Arvind Krishnamurthy · Luis Ceze