MLSys 2023 Accepted Papers

Safe Optimized Static Memory Allocation for Parallel Deep Learning Memory Optimization
Ioannis Lamprou (Huawei Technologies France) · Zhen Zhang (Huawei Paris Research Center) · Javier de Juan (Huawei Research Paris) · Hang Yang (huawei) · Yongqiang Lai (Huawei) · Etienne Filhol (Huawei Research Paris) · Cedric Bastoul (Huawei)
FedTree: A Federated Learning System For Trees Federated Learning
Qinbin Li (UC Berkeley) · Zhaomin Wu (National University of Singapore) · Yanzheng Cai (Tsinghua University) · yuxuan han (National University of Singapore) · Ching Man Yung (Hitachi Ltd.) · Tianyuan Fu (National University of Singapore) · Bingsheng He (National University of Singapore)
Efficiently Scaling Transformer Inference Measurement and Analysis
Reiner Pope (MatX) · Sholto Douglas (Google) · Aakanksha Chowdhery (Google DeepMind) · Jacob Devlin () · James Bradbury (Google) · Jonathan Heek (Google) · Kefan Xiao (Google) · Shivani Agrawal (Google) · Jeff Dean (Google)
Transcending Runtime-Memory Tradeoffs in Checkpointing by being Fusion Aware Memory Optimization
Horace He (PyTorch) · Shangdi Yu (Massachusetts Institute of Technology)
Be Careful with PyPI Packages: You May Unconsciously Spread Backdoor Model Weights Correctness and Security
Tianhang Zheng (University of Toronto) · Hao Lan (University of Toronto) · Baochun Li (University of Toronto)
Guoliang HE (University of Cambridge) · Sean Parker (University of Cambridge) · Eiko Yoneki (University of Cambridge)
FLINT: A Platform for Federated Learning Integration Federated Learning
Ewen Wang (LinkedIn) · Boyi Chen (LinkedIn) · Mosharaf Chowdhury (University of Michigan, Ann Arbor) · Ajay Kannan (LinkedIn) · Franco Liang (LinkedIn)
Cupcake: A Compression Scheduler for Scalable Communication-Efficient Distributed Training Parallel and Distributed Systems 2: Communication
Zhuang Wang (Rice University ) · Xinyu Wu (Rice University) · Zhaozhuo Xu (Rice University) · T. S. Eugene Ng (Rice University)
Vidit Jain (Microsoft Research) · Jatin Prakash (Microsoft) · Deepak Saini (Microsoft) · Jian Jiao (Microsoft) · Ramachandran Ramjee (Microsoft Research) · Manik Varma (Microsoft Research)
SIRIUS: Harvesting Whole-Program Optimization Opportunities for DNNs Compilers
YIJIN LI (Institute of Computing Technology Chinese Academy of Sciences) · Jiacheng Zhao (Institute of Computing Technology, Chinese Academy of Sciences) · Sun Qianqi (ICT) · Haohui Mai (Cloud9 Technology Inc) · Lei Chen (Institute of Computing Technology, Chinese Academy of Sciences) · Wanlu Cao (The Institute of Computing Technology of the Chinese Academy of Sciences) · Yanfan Chen (University of Chinese Academy of Sciences) · Li zhicheng (中国科学院计算技术研究所) · YING LIU (The Institute of Computing Technology of the Chinese Academy of Sciences) · Xinyuan Zhang (The Institute of Computing Technology of the Chinese Academy of Sciences) · Xiyu Shi (Institute of Computing Technology, Chinese Academy of Sciences) · Jie Zhao (State Key Laboratory of Mathematical Engineering and Advanced Computing) · Jingling Xue (University of New South Wales) · HUIMIN CUI (The Institute of Computing Technology of the Chinese Academy of Sciences) · XiaoBing Feng (The Institute of Computing Technology of the Chinese Academy of Sciences)
SysNoise: Exploring and Benchmarking Training-Deployment System Inconsistency Correctness and Security
Yan Wang (SenseTime Research) · Yuhang Li (Yale University) · Ruihao Gong (SenseTime) · Aishan Liu (Beihang University) · yanfei wang (Sensetime) · Jian Hu (商汤科技) · Yongqiang Yao (sensetime) · Yunchen Zhang (UESTC) · tianzi xiaotian (NULL) · Fengwei Yu (Sensetime Research) · Xianglong Liu (BUAA)
ALCOP: Automatic Load-Compute Pipelining in Deep Learning Compiler for AI-GPUs Compilers
Guyue Huang (UC Santa Barbara) · Yang Bai (CUHK) · Liu Liu (Rensselaer Polytechnic Institute) · Yuke Wang (University of California, Santa Barbara) · Bei Yu (CUHK) · Yufei Ding (University of California, Santa Barbara) · Yuan Xie (University of California, Santa Barbara)
Adaptive Message Quantization and Parallelization for Distributed Full-graph GNN Training Parallel and Distributed Systems 2: Communication
Borui Wan (The University of Hong Kong) · Juntao Zhao (The University of Hong Kong) · Chuan Wu (The University of Hong Kong)
RecD: Deduplication for End-to-End Deep Learning Recommendation Model Training Infrastructure Storage, Scheduling, and Networking
Mark Zhao (Stanford University) · Dhruv Choudhary (Facebook Inc.) · Devashish Tyagi (Meta Inc) · Ajay Somani (Meta Platforms Inc.) · Max Kaplan (Meta) · Sung-Han Lin (Meta) · Sarunya Pumma (Meta) · Jongsoo Park (Meta Platforms) · Aarti Basant (Meta) · Niket Agarwal (NVIDIA) · Carole-Jean Wu (Meta) · Christos Kozyrakis (Stanford University)
RevBiFPN: The Fully Reversible Bidirectional Feature Pyramid Network Memory Optimization
Vitaliy Chiley (MosaicML) · Vithursan Thangarasa (Cerebras Systems Inc) · Abhay Gupta (Cerebras Systems Inc.) · Anshul Samar (Cerebras Systems) · Joel Hestness (Cerebras) · Dennis DeCoste (Cerebras)
Building Verified Neural Networks for Computer Systems with Ouroboros Correctness and Security
Cheng Tan (Northeastern) · Changliu Liu (Carnegie Mellon University) · Zhihao Jia (Carnegie Mellon University) · Tianhao Wei (Carnegie Mellon University)
ApproxCaliper: A Programmable Framework for Application-aware Neural Network Optimization Measurement and Analysis
Yifan Zhao (University of Illinois at Urbana-Champaign) · Hashim Sharif (University of Illinois at Urbana-Champaign) · Peter Pao-Huang (University of Illinois at Urbana-Champaign) · Vatsin Shah (University of Illinois at Urbana Champaign) · Arun Narenthiran Sivakumar (University of Illinois, Urbana-Champaign) · Mateus Valverde Gasparino (University of Illinois at Urbana-Champaign) · Abdulrahman Mahmoud (Harvard University) · Nathan Zhao (University of Illinois at Urbana-Champaign) · Sarita Adve (University of Illinois at Urbana-Champaign) · Girish Chowdhary (University of Illinois at Urbana Champaign) · Sasa Misailovic (UIUC) · Vikram Adve (University of Illinois)
Reducing Activation Recomputation in Large Transformer Models Memory Optimization
Vijay Anand Korthikanti (NVIDIA) · Jared Casper (Nvidia) · Sangkug Lym (Nvidia Corporation) · Lawrence McAfee (Nvidia) · Michael Andersch (NVIDIA) · Mohammad Shoeybi (None) · Bryan Catanzaro (NVIDIA)
Learning to Parallelize with OpenMP by Augmented Heterogeneous AST Representation ML for Systems
Le Chen (Iowa State University) · Quazi Ishtiaque Mahmud (Dept. of Computer Science, Iowa State University) · Hung Phan (Iowa State University) · Nesreen Ahmed (Intel Labs) · Ali Jannesari (Iowa State University)
PipeFisher: Efficient Training of Large Language Models Using Pipelining and Fisher Information Matrices Parallel and Distributed Systems 1: Parallelism
Kazuki Osawa (Google DeepMind) · Shigang Li (Beijing University of Posts and Telecommunications) · Torsten Hoefler (ETH Zurich)
Validating Large Language Models with ReLM Correctness and Security
Michael Kuchnik (Carnegie Mellon University) · Virginia Smith (Carnegie Mellon University) · George Amvrosiadis (Carnegie Mellon University)
Hotline Profiler: Automatic Annotation and A Multi-Scale Timeline for Visualizing Time-Use in DNN Training Measurement and Analysis
Daniel Snider (University of Toronto) · Fanny Chevalier (University of Toronto) · Gennady Pekhimenko (University of Toronto)
Practical Edge Kernels for Integer-Only Vision Transformers Under Post-training Quantization Edge
Zining Zhang (National university of Singapore) · Bingsheng He (National University of Singapore) · Zhenjie Zhang (Neuron Mobility)
Breadth-First Pipeline Parallelism Parallel and Distributed Systems 1: Parallelism
Joel Lamy-Poirier (ServiceNow Research)
GiPH: Generalizable Placement Learning for Adaptive Heterogeneous Computing ML for Systems
Yi Hu (Carnegie Mellon University) · Chaoran Zhang (Carnegie Mellon University) · Edward Andert (Arizona State University) · Harshul Singh (Carnegie Mellon University) · Aviral Shrivastava (Arizona State University) · James Laudon (Google) · Yanqi Zhou (Google) · Bob Iannucci (Google) · Carlee Joe-Wong (Carnegie Mellon University)
On Noisy Evaluation in Federated Hyperparameter Tuning Federated Learning
Kevin Kuo (Carnegie Mellon University) · Pratiksha Thaker (Carnegie Mellon University) · Mikhail Khodak (Carnegie Mellon University) · John Nguyen (Facebook) · Daniel Jiang (Facebook) · Ameet Talwalkar (CMU) · Virginia Smith (Carnegie Mellon University)
PyTorch RPC: Distributed Deep Learning Built on Tensor-Optimized Remote Procedure Calls Storage, Scheduling, and Networking
Pritam Damania (Tesla) · Shen Li (Meta) · Alban Desmaison (Meta) · Alisson Azzolini (Facebook) · Brian Vaughan (Meta) · Edward Yang (Meta) · Gregory Chanan (Meta) · Guoqiang Jerry Chen (Meta Platform) · Hongyi Jia (Meta) · Howard Huang (Meta) · Joseph Spisak (None) · Luca Wehrstedt (Meta AI) · Lucas Hosseini (Meta) · Manoj Krishnan (Meta) · Omkar Salpekar (Meta AI) · Pavel Belevich (Meta) · Rohan Varma (Meta) · Satendra Gera (Satendra Gera) · Wanchao Liang (Meta Platforms, Inc.) · Shihao Xu (Meta Platform) · Soumith Chintala (Meta AI) · Chaoyang He (FedML) · Amir Ziashahabi (University of Southern California) · Salman Avestimehr (FedML) · (None) · Zachary DeVito (Facebook AI Research)
Exploiting Hardware Utilization and Adaptive Dataflow for Efficient Sparse Convolution in 3D Point Clouds Sparsity 2: Systems
Ke Hong (Tsinghua University) · Zhongming Yu (University of California, San Diego) · Guohao Dai (Shanghai Jiao Tong University) · Xinhao Yang (Tsinghua University) · Yaoxiu Lian (Shanghai Jong Tong University) · 泽浩 刘 (Tsinghua University) · Ningyi Xu (SJTU) · Yu Wang (Tsinghua University)
MegaBlocks: Efficient Sparse Training with Mixture-of-Experts Sparsity 1: Models and Algorithms
Trevor Gale (Stanford University) · Deepak Narayanan (Microsoft Research) · Cliff Young ( · Matei Zaharia (Stanford and Databricks)
Unified Convolution Framework: A compiler-based approach to support sparse convolutions Sparsity 2: Systems
Jaeyeon Won (MIT) · Changwan Hong () · Charith Mendis (University of Illinois at Urbana-Champaign) · Joel Emer (Massachusetts Institute of Technology) · Saman Amarasinghe (MIT)
Cuttlefish: Low-Rank Model Training without All the Tuning Sparsity 1: Models and Algorithms
Hongyi Wang (Carnegie Mellon University) · Saurabh Agarwal (UW-Madison) · Pongsakorn U-chupala (Sony Corporation) · Yoshiki Tanaka (Sony Corporation) · Eric Xing (MBZUAI, CMU, and Petuum Inc.) · Dimitris Papailiopoulos (University of Wisconsin-Madison)
GlueFL: Reconciling Client Sampling and Model Masking for Bandwidth Efficient Federated Learning Federated Learning
Shiqi He (University of British Columbia) · Qifan Yan (University of British Columbia) · Feijie Wu (Purdue University) · Lanjun Wang (Tianjin University) · Mathias Lécuyer (University of British Columbia) · Ivan Beschastnikh (University of British Columbia)
Edge Impulse: An MLOps Platform for Tiny Machine Learning Edge
colby banbury (Harvard university) · Vijay Janapa Reddi (Harvard University) · Alexander Elium (Edge Impulse) · Shawn Hymel (Edge Impulse, Inc.) · David Tischler (Edge Impulse) · Daniel Situnayake (Edge Impulse Inc.) · Carl Ward (Edge Impulse ) · Louis Moreau (Edge Impulse) · Jenny Plunkett (Edge Impulse) · Matthew Kelcey (Edge Impulse) · Mathijs Baaijens (Edge Impulse) · Alessandro Grande (Edge Impulse) · Dmitry Maslov (Edge Impulse) · Arthur Beavis (Edge Impulse) · Jan Jongboom (Edge Impulse) · Jessica Quaye (Harvard University)
On Optimizing the Communication of Model Parallelism Parallel and Distributed Systems 2: Communication
Yonghao Zhuang (Carnegie Mellon University) · Hexu Zhao (None) · Lianmin Zheng (UC Berkeley) · Zhuohan Li (UC Berkeley) · Eric Xing (MBZUAI, CMU, and Petuum Inc.) · Qirong Ho (MBZUAI) · Joseph Gonzalez (UC Berkeley) · Ion Stoica (UC Berkeley) · Hao Zhang (UC Berkeley) · Hexu Zhao (None)
Tutel: Adaptive Mixture-of-Experts at Scale Parallel and Distributed Systems 1: Parallelism
Changho Hwang (Microsoft Research) · Wei Cui (Microsoft Research Asia) · Yifan Xiong (Microsoft Research) · Ziyue Yang (Microsoft Research) · Ze Liu (USTC) · Han Hu (Microsoft Research Asia) · Zilong Wang (Microsoft) · Rafael Salas (Microsoft) · Jithin Jose (Microsoft) · Prabhat Ram (Microsoft ) · HoYuen Chau (Microsoft) · Peng Cheng (None) · Fan Yang (Microsoft Research) · Mao Yang (MSRA) · Yongqiang Xiong (MSRA)
Efficient GPU Kernels for N:M-Sparse Weights in Deep Learning Sparsity 2: Systems
Bin Lin (Tsinghua University) · Ningxin Zheng () · Lei Wang (None) · Shijie Cao (Microsoft Research Asia) · Lingxiao Ma (Microsoft Research) · Quanlu Zhang () · Yi Zhu (Microsoft Research Asia) · Ting Cao (MSRA) · Jilong Xue (Microsoft Research) · Yuqing Yang (Microsoft Research) · Fan Yang (Microsoft Research)
Payman Behnam (Georgia Institute of Technology) · Alexey Tumanov (Georgia Tech) · Tushar Krishna (Georgia Institute of Technology) · Pranav Gadikar (Georgia Institute of Technology) · Yangyu Chen (Georgia Institute of technology) · Jianming Tong (Georgia Tech) · Yue Pan (University of California San Diego) · Abhimanyu Rajeshkumar Bambhaniya (Georgia Institute of Technology) · Alind Khare (Georgia Tech)
XRBench: An Extended Reality (XR) Machine Learning Benchmark Suite for the Metaverse Emerging Models and Domains
Hyoukjun Kwon (University of California, Irvine) · Krishnakumar Nair (Meta) · Jamin Seo (Georgia Institute of Technology) · Jason Yik (Harvard University) · Debabrata Mohapatra (Meta) · Dongyuan Zhan (Meta Inc.) · JINOOK SONG (META) · Peter Capak (Meta XRTech) · Peizhao Zhang (Meta) · Peter Vajda (Facebook) · Colby Banbury (Harvard) · Mark Mazumder (Harvard University) · Liangzhen Lai (Facebook Inc) · Ashish Sirasao (Meta inc) · Tushar Krishna (Georgia Institute of Technology) · Harshit Khaitan (Meta) · Vikas Chandra (Meta) · Vijay Janapa Reddi (Harvard University)
AutoScratch: ML-Optimized Cache Management for Inference-Oriented GPUs ML for Systems
Yaosheng Fu (NVIDIA) · Evgeny Bolotin (NVIDIA) · Aamer Jaleel (NVIDIA) · Gal Dalal (NVIDIA Research) · Shie Mannor (Nvidia) · Jacob Subag (NVIDIA) · Noam Korem (NVIDIA) · Michael Behar (Nvidia) · David Nellans (NVIDIA)
Pre-train and Search: Efficient Embedding Table Sharding with Pre-trained Neural Cost Models Storage, Scheduling, and Networking
Daochen Zha (Rice University) · Louis Feng (Meta) · Liang Luo (Meta Inc) · Bhargav Bhushanam (Facebook) · Zirui Liu (Rice University) · Yusuo Hu (Meta Inc.) · Jade Nie (Meta) · Yuzhen Huang (Meta) · Yuandong Tian (Meta) · Arun Kejariwal (Meta Platforms Inc.) · Xia Hu (Rice University)
Sparsity-Aware Memory Interface Architecture using Stacked XORNet Compression for Accelerating Pruned-DNN Models Sparsity 2: Systems
Younghoon Byun (POSTECH) · Seungsik Moon (Pohang University of Science and Technology (POSTECH)) · Baeseong Park (NAVER CLOVA) · Se Jung Kwon (NAVER Cloud) · Dongsoo Lee (NAVER CLOVA) · Gunho Park (POSTECH) · Eunji Yoo (POSTECH) · Jung Gyu Min (Pohang University of Science and Technology (POSTECH)) · Youngjoo Lee (Pohang University of Science and Technology (POSTECH))
HyperGef: A Framework Enabling Efficient Fusion for Hypergraph Neural Network on GPUs Emerging Models and Domains
Zhongming Yu (University of California, San Diego) · Guohao Dai (Shanghai Jiao Tong University) · Shang Yang (Tsinghua University) · Genghan Zhang (Tsinghua University) · Hengrui Zhang (Princeton University) · Feiwen Zhu (nvidia) · June Yang (NVIDIA) · Jishen Zhao (University of California, San Diego) · Yu Wang (Tsinghua University)
μ-TWO: 3× Faster Multi-Model Training with Orchestration and Memory Optimization Storage, Scheduling, and Networking
Sanket Purandare (Harvard University) · Abdul Wasay (Intel Labs) · Stratos Idreos (Harvard) · Animesh Jain (None)
Communication-Efficient Graph Neural Networks with Probabilistic Neighborhood Expansion Analysis and Caching Parallel and Distributed Systems 2: Communication
Tim Kaler (MIT CSAIL) · Alexandros Iliopoulos (MIT CSAIL) · Philip Murzynowski (MIT) · Tao Schardl (MIT CSAIL) · Charles E. Leiserson (MIT CSAIL) · Jie Chen (MIT-IBM Watson AI Lab, IBM Research)
Virtual Machine Allocation with Lifetime Predictions ML for Systems
Hugo Barbalho (Microsoft) · Patricia Kovaleski (Microsoft) · Beibin Li (Microsoft Research) · Luke Marshall (Microsoft Research) · Marco Molinaro (Microsoft Research and PUC-Rio) · Abhisek Pan (Microsoft) · Eli Cortez (Microsoft) · Matheus Leao (Microsoft) · Harsh Patwari (University of Washington) · Zuzu Tang (Microsoft) · Larissa Rozales Gonçalves (Microsoft) · David Dion (Microsoft) · Thomas Moscibroda (Microsoft, USA) · Ishai Menache (Microsoft Research)
Uniform Sparsity in Deep Neural Networks Sparsity 1: Models and Algorithms
Saurav Muralidharan (NVIDIA)