Skip to yearly menu bar Skip to main content


MLSys 2023 Accepted Papers

Timezone: America/New_York
Uniform Sparsity in Deep Neural Networks Sparsity 1: Models and Algorithms
Saurav Muralidharan
Ballroom B - Position 17
Exploiting Hardware Utilization and Adaptive Dataflow for Efficient Sparse Convolution in 3D Point Clouds Sparsity 2: Systems
Ke Hong ⋅ Zhongming Yu ⋅ Guohao Dai ⋅ Xinhao Yang ⋅ Yaoxiu Lian ⋅ 泽浩 刘 ⋅ Ningyi Xu ⋅ Yu Wang
Ballroom B - Position 21
HyperGef: A Framework Enabling Efficient Fusion for Hypergraph Neural Network on GPUs Emerging Models and Domains
Zhongming Yu ⋅ Guohao Dai ⋅ Shang Yang ⋅ Genghan Zhang ⋅ Hengrui Zhang ⋅ Feiwen Zhu ⋅ June Yang ⋅ Jishen Zhao ⋅ Yu Wang
Ballroom B - Position 39
Virtual Machine Allocation with Lifetime Predictions ML for Systems
Hugo Barbalho ⋅ Patricia Kovaleski ⋅ Beibin Li ⋅ Luke Marshall ⋅ Marco Molinaro ⋅ Abhisek Pan ⋅ Eli Cortez ⋅ Matheus Leao ⋅ Harsh Patwari ⋅ Zuzu Tang ⋅ Larissa Rozales Gonçalves ⋅ David Dion ⋅ Thomas Moscibroda ⋅ Ishai Menache
Ballroom B - Position 33
Breadth-First Pipeline Parallelism Parallel and Distributed Systems 1: Parallelism
Joel Lamy-Poirier
Ballroom B - Position 3
RecD: Deduplication for End-to-End Deep Learning Recommendation Model Training Infrastructure Storage, Scheduling, and Networking
Mark Zhao ⋅ Dhruv Choudhary ⋅ Devashish Tyagi ⋅ Ajay Somani ⋅ Max Kaplan ⋅ Sung-Han Lin ⋅ Sarunya Pumma ⋅ Jongsoo Park ⋅ Aarti Basant ⋅ Niket Agarwal ⋅ Carole-Jean Wu ⋅ Christos Kozyrakis
Ballroom B - Position 43
SysNoise: Exploring and Benchmarking Training-Deployment System Inconsistency Correctness and Security
Yan Wang ⋅ Yuhang Li ⋅ Ruihao Gong ⋅ Aishan Liu ⋅ yanfei wang ⋅ Jian Hu ⋅ Yongqiang Yao ⋅ Yunchen Zhang ⋅ tianzi xiaotian ⋅ Fengwei Yu ⋅ Xianglong Liu
Ballroom B - Position 13
FLINT: A Platform for Federated Learning Integration Federated Learning
Ewen Wang ⋅ Boyi Chen ⋅ Mosharaf Chowdhury ⋅ Ajay Kannan ⋅ Franco Liang
Ballroom B - Position 27
Building Verified Neural Networks for Computer Systems with Ouroboros Correctness and Security
Cheng Tan ⋅ Changliu Liu ⋅ Zhihao Jia ⋅ Tianhao Wei
Ballroom B - Position 15
Cupcake: A Compression Scheduler for Scalable Communication-Efficient Distributed Training Parallel and Distributed Systems 2: Communication
Zhuang Wang ⋅ Xinyu Wu ⋅ Zhaozhuo Xu ⋅ T. S. Eugene Ng
Ballroom B - Position 4
Cuttlefish: Low-Rank Model Training without All the Tuning Sparsity 1: Models and Algorithms
Hongyi Wang ⋅ Saurabh Agarwal ⋅ Pongsakorn U-chupala ⋅ Yoshiki Tanaka ⋅ Eric Xing ⋅ Dimitris Papailiopoulos
Ballroom B - Position 18
Tutel: Adaptive Mixture-of-Experts at Scale Parallel and Distributed Systems 1: Parallelism
Changho Hwang ⋅ Wei Cui ⋅ Yifan Xiong ⋅ Ziyue Yang ⋅ Ze Liu ⋅ Han Hu ⋅ Zilong Wang ⋅ Rafael Salas ⋅ Jithin Jose ⋅ Prabhat Ram ⋅ HoYuen Chau ⋅ Peng Cheng ⋅ Fan Yang ⋅ Mao Yang ⋅ Yongqiang Xiong
Ballroom B - Position 2
Be Careful with PyPI Packages: You May Unconsciously Spread Backdoor Model Weights Correctness and Security
Tianhang Zheng ⋅ Hao Lan ⋅ Baochun Li
Ballroom B - Position 14
GiPH: Generalizable Placement Learning for Adaptive Heterogeneous Computing ML for Systems
Yi Hu ⋅ Chaoran Zhang ⋅ Edward Andert ⋅ Harshul Singh ⋅ Aviral Shrivastava ⋅ James Laudon ⋅ Yanqi Zhou ⋅ Bob Iannucci ⋅ Carlee Joe-Wong
Ballroom B - Position 31
Communication-Efficient Graph Neural Networks with Probabilistic Neighborhood Expansion Analysis and Caching Parallel and Distributed Systems 2: Communication
Tim Kaler ⋅ Alexandros Iliopoulos ⋅ Philip Murzynowski ⋅ Tao Schardl ⋅ Charles E. Leiserson ⋅ Jie Chen
Ballroom B - Position 5
SUBGRAPH STATIONARY HARDWARE-SOFTWARE INFERENCE CO-DESIGN Edge
Payman Behnam ⋅ Alexey Tumanov ⋅ Tushar Krishna ⋅ Pranav Gadikar ⋅ Yangyu Chen ⋅ Jianming Tong ⋅ Yue Pan ⋅ Abhimanyu Rajeshkumar Bambhaniya ⋅ Alind Khare
Ballroom B - Position 46
GlueFL: Reconciling Client Sampling and Model Masking for Bandwidth Efficient Federated Learning Federated Learning
Shiqi He ⋅ Qifan Yan ⋅ Feijie Wu ⋅ Lanjun Wang ⋅ Mathias Lécuyer ⋅ Ivan Beschastnikh
Ballroom B - Position 29
FedTree: A Federated Learning System For Trees Federated Learning
Qinbin Li ⋅ Zhaomin Wu ⋅ Yanzheng Cai ⋅ yuxuan han ⋅ Ching Man Yung ⋅ Tianyuan Fu ⋅ Bingsheng He
Ballroom B - Position 26
RevBiFPN: The Fully Reversible Bidirectional Feature Pyramid Network Memory Optimization
Vitaliy Chiley ⋅ Vithursan Thangarasa ⋅ Abhay Gupta ⋅ Anshul Samar ⋅ Joel Hestness ⋅ Dennis DeCoste
Ballroom B - Position 11
On Optimizing the Communication of Model Parallelism Parallel and Distributed Systems 2: Communication
Yonghao Zhuang ⋅ Lianmin Zheng ⋅ Zhuohan Li ⋅ Eric Xing ⋅ Qirong Ho ⋅ Joseph Gonzalez ⋅ Ion Stoica ⋅ Hao Zhang ⋅ Hexu Zhao
Ballroom B - Position 7
Validating Large Language Models with ReLM Correctness and Security
Michael Kuchnik ⋅ Virginia Smith ⋅ George Amvrosiadis
Ballroom B - Position 12
Efficient GPU Kernels for N:M-Sparse Weights in Deep Learning Sparsity 2: Systems
Bin Lin ⋅ Ningxin Zheng ⋅ Lei Wang ⋅ Shijie Cao ⋅ Lingxiao Ma ⋅ Quanlu Zhang ⋅ Yi Zhu ⋅ Ting Cao ⋅ Jilong Xue ⋅ Yuqing Yang ⋅ Fan Yang
Ballroom B - Position 19
Transcending Runtime-Memory Tradeoffs in Checkpointing by being Fusion Aware Memory Optimization
Horace He ⋅ Shangdi Yu
Ballroom B - Position 8
Efficiently Scaling Transformer Inference Measurement and Analysis
Reiner Pope ⋅ Sholto Douglas ⋅ Aakanksha Chowdhery ⋅ Jacob Devlin ⋅ James Bradbury ⋅ Jonathan Heek ⋅ Kefan Xiao ⋅ Shivani Agrawal ⋅ Jeff Dean
Ballroom B - Position 23
Hotline Profiler: Automatic Annotation and A Multi-Scale Timeline for Visualizing Time-Use in DNN Training Measurement and Analysis
Daniel Snider ⋅ Fanny Chevalier ⋅ Gennady Pekhimenko
Ballroom B - Position 24
On Noisy Evaluation in Federated Hyperparameter Tuning Federated Learning
Kevin Kuo ⋅ Pratiksha Thaker ⋅ Mikhail Khodak ⋅ John Nguyen ⋅ Daniel Jiang ⋅ Ameet Talwalkar ⋅ Virginia Smith
Ballroom B - Position 28
Adaptive Message Quantization and Parallelization for Distributed Full-graph GNN Training Parallel and Distributed Systems 2: Communication
Borui Wan ⋅ Juntao Zhao ⋅ Chuan Wu
Ballroom B - Position 6
Learning to Parallelize with OpenMP by Augmented Heterogeneous AST Representation ML for Systems
Le Chen ⋅ Quazi Ishtiaque Mahmud ⋅ Hung Phan ⋅ Nesreen Ahmed ⋅ Ali Jannesari
Ballroom B - Position 32
ApproxCaliper: A Programmable Framework for Application-aware Neural Network Optimization Measurement and Analysis
Yifan Zhao ⋅ Hashim Sharif ⋅ Peter Pao-Huang ⋅ Vatsin Shah ⋅ Arun Narenthiran Sivakumar ⋅ Mateus Valverde Gasparino ⋅ Abdulrahman Mahmoud ⋅ Nathan Zhao ⋅ Sarita Adve ⋅ Girish Chowdhary ⋅ Sasa Misailovic ⋅ Vikram Adve
Ballroom B - Position 25
XRBench: An Extended Reality (XR) Machine Learning Benchmark Suite for the Metaverse Emerging Models and Domains
Hyoukjun Kwon ⋅ Krishnakumar Nair ⋅ Jamin Seo ⋅ Jason Yik ⋅ Debabrata Mohapatra ⋅ Dongyuan Zhan ⋅ JINOOK SONG ⋅ Peter Capak ⋅ Peizhao Zhang ⋅ Peter Vajda ⋅ Colby Banbury ⋅ Mark Mazumder ⋅ Liangzhen Lai ⋅ Ashish Sirasao ⋅ Tushar Krishna ⋅ Harshit Khaitan ⋅ Vikas Chandra ⋅ Vijay Janapa Reddi
Ballroom B - Position 37
AutoScratch: ML-Optimized Cache Management for Inference-Oriented GPUs ML for Systems
Yaosheng Fu ⋅ Evgeny Bolotin ⋅ Aamer Jaleel ⋅ Gal Dalal ⋅ Shie Mannor ⋅ Jacob Subag ⋅ Noam Korem ⋅ Michael Behar ⋅ David Nellans
Ballroom B - Position 30
Unified Convolution Framework: A compiler-based approach to support sparse convolutions Sparsity 2: Systems
Jaeyeon Won ⋅ Changwan Hong ⋅ Charith Mendis ⋅ Joel Emer ⋅ Saman Amarasinghe
Ballroom B - Position 20
Edge Impulse: An MLOps Platform for Tiny Machine Learning Edge
colby banbury ⋅ Vijay Janapa Reddi ⋅ Alexander Elium ⋅ Shawn Hymel ⋅ David Tischler ⋅ Daniel Situnayake ⋅ Carl Ward ⋅ Louis Moreau ⋅ Jenny Plunkett ⋅ Matthew Kelcey ⋅ Mathijs Baaijens ⋅ Alessandro Grande ⋅ Dmitry Maslov ⋅ Arthur Beavis ⋅ Jan Jongboom ⋅ Jessica Quaye
Ballroom B - Position 45
Safe Optimized Static Memory Allocation for Parallel Deep Learning Memory Optimization
Ioannis Lamprou ⋅ Zhen Zhang ⋅ Javier de Juan ⋅ Hang Yang ⋅ Yongqiang Lai ⋅ Etienne Filhol ⋅ Cedric Bastoul
Ballroom B - Position 9
Renee: END-TO-END TRAINING OF EXTREME CLASSIFICATION MODELS Emerging Models and Domains
Vidit Jain ⋅ Jatin Prakash ⋅ Deepak Saini ⋅ Jian Jiao ⋅ Ramachandran Ramjee ⋅ Manik Varma
Ballroom B - Position 38
Reducing Activation Recomputation in Large Transformer Models Memory Optimization
Vijay Anand Korthikanti ⋅ Jared Casper ⋅ Sangkug Lym ⋅ Lawrence McAfee ⋅ Michael Andersch ⋅ Mohammad Shoeybi ⋅ Bryan Catanzaro
Ballroom B - Position 10
Sparsity-Aware Memory Interface Architecture using Stacked XORNet Compression for Accelerating Pruned-DNN Models Sparsity 2: Systems
Younghoon Byun ⋅ Seungsik Moon ⋅ Baeseong Park ⋅ Se Jung Kwon ⋅ Dongsoo Lee ⋅ Gunho Park ⋅ Eunji Yoo ⋅ Jung Gyu Min ⋅ Youngjoo Lee
Ballroom B - Position 22
ALCOP: Automatic Load-Compute Pipelining in Deep Learning Compiler for AI-GPUs Compilers
Guyue Huang ⋅ Yang Bai ⋅ Liu Liu ⋅ Yuke Wang ⋅ Bei Yu ⋅ Yufei Ding ⋅ Yuan Xie
Ballroom B - Position 34
Practical Edge Kernels for Integer-Only Vision Transformers Under Post-training Quantization Edge
Zining Zhang ⋅ Bingsheng He ⋅ Zhenjie Zhang
Ballroom B - Position 44
PipeFisher: Efficient Training of Large Language Models Using Pipelining and Fisher Information Matrices Parallel and Distributed Systems 1: Parallelism
Kazuki Osawa ⋅ Shigang Li ⋅ Torsten Hoefler
Ballroom B - Position 1
SIRIUS: Harvesting Whole-Program Optimization Opportunities for DNNs Compilers
YIJIN LI ⋅ Jiacheng Zhao ⋅ Sun Qianqi ⋅ Haohui Mai ⋅ Lei Chen ⋅ Wanlu Cao ⋅ Yanfan Chen ⋅ Li zhicheng ⋅ YING LIU ⋅ Xinyuan Zhang ⋅ Xiyu Shi ⋅ Jie Zhao ⋅ Jingling Xue ⋅ HUIMIN CUI ⋅ XiaoBing Feng
Ballroom B - Position 35
X-RLFLOW: GRAPH REINFORCEMENT LEARNING FOR NEURAL NETWORK SUBGRAPHS TRANSFORMATION Compilers
Guoliang HE ⋅ Sean Parker ⋅ Eiko Yoneki
Ballroom B - Position 36
μ-TWO: 3× Faster Multi-Model Training with Orchestration and Memory Optimization Storage, Scheduling, and Networking
Sanket Purandare ⋅ Abdul Wasay ⋅ Stratos Idreos ⋅ Animesh Jain
Ballroom B - Position 41
PyTorch RPC: Distributed Deep Learning Built on Tensor-Optimized Remote Procedure Calls Storage, Scheduling, and Networking
Pritam Damania ⋅ Shen Li ⋅ Alban Desmaison ⋅ Alisson Azzolini ⋅ Brian Vaughan ⋅ Edward Yang ⋅ Gregory Chanan ⋅ Guoqiang Jerry Chen ⋅ Hongyi Jia ⋅ Howard Huang ⋅ Joseph Spisak ⋅ Luca Wehrstedt ⋅ Lucas Hosseini ⋅ Manoj Krishnan ⋅ Omkar Salpekar ⋅ Pavel Belevich ⋅ Rohan Varma ⋅ Satendra Gera ⋅ Wanchao Liang ⋅ Shihao Xu ⋅ Soumith Chintala ⋅ Chaoyang He ⋅ Amir Ziashahabi ⋅ Salman Avestimehr ⋅ ⋅ Zachary DeVito
Ballroom B - Position 42
Pre-train and Search: Efficient Embedding Table Sharding with Pre-trained Neural Cost Models Storage, Scheduling, and Networking
Daochen Zha ⋅ Louis Feng ⋅ Liang Luo ⋅ Bhargav Bhushanam ⋅ Zirui Liu ⋅ Yusuo Hu ⋅ Jade Nie ⋅ Yuzhen Huang ⋅ Yuandong Tian ⋅ Arun Kejariwal ⋅ Xia Hu
Ballroom B - Position 40
MegaBlocks: Efficient Sparse Training with Mixture-of-Experts Sparsity 1: Models and Algorithms
Trevor Gale ⋅ Deepak Narayanan ⋅ Cliff Young ⋅ Matei Zaharia
Ballroom B - Position 16