Timezone: »
Data access between on- and off-chip memories account for a large fraction of overall energy consumption during inference with deep learning networks. On-chip memory compression can greatly reduce this energy cost as long as it balances the simplicity and low cost of the compression/decompression implementation and its effectiveness in data size reduction. We present Boveda, a simple and effective on-chip lossless memory compression technique for fixed-point precision networks. It reduces data widths by exploiting the value distribution deep learning applications naturally exhibit. Boveda can increase the effective on-chip capacity, reduce off-chip traffic, and/or achieve a desired performance/energy target while using smaller on-chip memories. Boveda can be placed after any memory block in the on-chip memory hierarchy and can work with \textul{any} data-parallel processing units such as the vector-like or the tensorcore units of modern graphics processors, systolic arrays such as that used in the Tensor Processing Unit, and units that process sparse tensors such as those used in the SCNN accelerator. To demonstrate the potential of Boveda, we implement it over (i) SCNN, a state-of-the-art accelerator for sparse networks, (ii) a Tensorcore-like architecture, and (iii) TPU. Boveda reduces memory footprint by 34\% for SCNN and sparse models on top of zero compression. For dense models, Boveda improves compression by 47\%. We also present a prototype FPGA implementation.
Author Information
Isak Edo Vivancos (University of Toronto)
Sayeh Sharify (University of Toronto)
Daniel Ly-Ma (University of Toronto)
Ameer Abdelhadi (University of Toronto)
Ciaran Bannon (University of Toronto)
Milos Nikolic (University of Toronto)
Mostafa Mahmoud (University of Toronto)
Alberto Delmas Lascorz (University of Toronto)
Gennady Pekhimenko (University of Toronto)
Andreas Moshovos (University of Toronto)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Poster: Boveda: Building an On-Chip Deep Learning Memory Hierarchy Brick by Brick »
09 Apr 12:00 AM Room
More from the Same Authors
-
2022 Poster: DietCode: Automatic Optimization for Dynamic Tensor Programs »
Bojian Zheng · Ziheng Jiang · Cody Hao Yu · Haichen Shen · Joshua Fromm · Yizhi Liu · Yida Wang · Luis Ceze · Tianqi Chen · Gennady Pekhimenko -
2023 Poster: Hotline Profiler: Automatic Annotation and A Multi-Scale Timeline for Visualizing Time-Use in DNN Training »
Daniel Snider · Fanny Chevalier · Gennady Pekhimenko -
2022 Symposium: Chips & Compilers »
Yida Wang · Gennady Pekhimenko -
2022 Oral: DietCode: Automatic Optimization for Dynamic Tensor Programs »
Bojian Zheng · Ziheng Jiang · Cody Hao Yu · Haichen Shen · Joshua Fromm · Yizhi Liu · Yida Wang · Luis Ceze · Tianqi Chen · Gennady Pekhimenko -
2021 : Industry/Academia Panel »
Zachary C Lipton · Udit Gupta · Lillian Pentecost · Shagun Sodhani · Abhishek Gupta · Mayoore Jaiswal · Michael Carbin · Devi Parikh · Gennady Pekhimenko -
2021 : "Machine Learning Tools: Skyline and RL-Scope" - Gennady Pekhimenko and James Gleeson (University of Toronto) »
Gennady Pekhimenko -
2021 Poster: Horizontally Fused Training Array: An Effective Hardware Utilization Squeezer for Training Novel Deep Learning Models »
Shang Wang · Peiming Yang · Yuxuan Zheng · Xin Li · Gennady Pekhimenko -
2021 Oral: Horizontally Fused Training Array: An Effective Hardware Utilization Squeezer for Training Novel Deep Learning Models »
Shang Wang · Peiming Yang · Yuxuan Zheng · Xin Li · Gennady Pekhimenko -
2021 Poster: RL-Scope: Cross-stack Profiling for Deep Reinforcement Learning Workloads »
James Gleeson · Srivatsan Krishnan · Moshe Gabel · Vijay Janapa Reddi · Eyal de Lara · Gennady Pekhimenko -
2021 Poster: IOS: Inter-Operator Scheduler for CNN Acceleration »
Yaoyao Ding · Ligeng Zhu · Zhihao Jia · Gennady Pekhimenko · Song Han -
2021 Oral: IOS: Inter-Operator Scheduler for CNN Acceleration »
Yaoyao Ding · Ligeng Zhu · Zhihao Jia · Gennady Pekhimenko · Song Han -
2021 Oral: RL-Scope: Cross-stack Profiling for Deep Reinforcement Learning Workloads »
James Gleeson · Srivatsan Krishnan · Moshe Gabel · Vijay Janapa Reddi · Eyal de Lara · Gennady Pekhimenko -
2020 Oral: MLPerf Training Benchmark »
Peter Mattson · Christine Cheng · Gregory Diamos · Cody Coleman · Paulius Micikevicius · David Patterson · Hanlin Tang · Gu-Yeon Wei · Peter Bailis · Victor Bittorf · David Brooks · Dehao Chen · Debo Dutta · Udit Gupta · Kim Hazelwood · Andy Hock · Xinyuan Huang · Daniel Kang · David Kanter · Naveen Kumar · Jeffery Liao · Deepak Narayanan · Tayo Oguntebi · Gennady Pekhimenko · Lillian Pentecost · Vijay Janapa Reddi · Taylor Robie · Tom St John · Carole-Jean Wu · Lingjie Xu · Cliff Young · Matei Zaharia -
2020 Poster: MLPerf Training Benchmark »
Peter Mattson · Christine Cheng · Gregory Diamos · Cody Coleman · Paulius Micikevicius · David Patterson · Hanlin Tang · Gu-Yeon Wei · Peter Bailis · Victor Bittorf · David Brooks · Dehao Chen · Debo Dutta · Udit Gupta · Kim Hazelwood · Andy Hock · Xinyuan Huang · Daniel Kang · David Kanter · Naveen Kumar · Jeffery Liao · Deepak Narayanan · Tayo Oguntebi · Gennady Pekhimenko · Lillian Pentecost · Vijay Janapa Reddi · Taylor Robie · Tom St John · Carole-Jean Wu · Lingjie Xu · Cliff Young · Matei Zaharia -
2020 Poster: BPPSA: Scaling Back-propagation by Parallel Scan Algorithm »
Shang Wang · Yifan Bai · Gennady Pekhimenko -
2020 Demonstration: Skyline: Interactive In-editor Performance Visualizations and Debugging for DNN Training »
Geoffrey Yu · Tovi Grossman · Gennady Pekhimenko -
2020 Oral: BPPSA: Scaling Back-propagation by Parallel Scan Algorithm »
Shang Wang · Yifan Bai · Gennady Pekhimenko