MLSys 2023 Accepted Papers
Timezone: America/New_York
|
Uniform Sparsity in Deep Neural Networks
Sparsity 1: Models and Algorithms
Saurav Muralidharan
|
Ballroom B - Position 17 | |
|
Exploiting Hardware Utilization and Adaptive Dataflow for Efficient Sparse Convolution in 3D Point Clouds
Sparsity 2: Systems
Ke Hong ⋅ Zhongming Yu ⋅ Guohao Dai ⋅ Xinhao Yang ⋅ Yaoxiu Lian ⋅ 泽浩 刘 ⋅ Ningyi Xu ⋅ Yu Wang
|
Ballroom B - Position 21 | |
|
HyperGef: A Framework Enabling Efficient Fusion for Hypergraph Neural Network on GPUs
Emerging Models and Domains
Zhongming Yu ⋅ Guohao Dai ⋅ Shang Yang ⋅ Genghan Zhang ⋅ Hengrui Zhang ⋅ Feiwen Zhu ⋅ June Yang ⋅ Jishen Zhao ⋅ Yu Wang
|
Ballroom B - Position 39 | |
|
Virtual Machine Allocation with Lifetime Predictions
Hugo Barbalho ⋅ Patricia Kovaleski ⋅ Beibin Li ⋅ Luke Marshall ⋅ Marco Molinaro ⋅ Abhisek Pan ⋅ Eli Cortez ⋅ Matheus Leao ⋅ Harsh Patwari ⋅ Zuzu Tang ⋅ Larissa Rozales Gonçalves ⋅ David Dion ⋅ Thomas Moscibroda ⋅ Ishai Menache
|
Ballroom B - Position 33 | |
|
Breadth-First Pipeline Parallelism
Parallel and Distributed Systems 1: Parallelism
Joel Lamy-Poirier
|
Ballroom B - Position 3 | |
|
RecD: Deduplication for End-to-End Deep Learning Recommendation Model Training Infrastructure
Storage, Scheduling, and Networking
Mark Zhao ⋅ Dhruv Choudhary ⋅ Devashish Tyagi ⋅ Ajay Somani ⋅ Max Kaplan ⋅ Sung-Han Lin ⋅ Sarunya Pumma ⋅ Jongsoo Park ⋅ Aarti Basant ⋅ Niket Agarwal ⋅ Carole-Jean Wu ⋅ Christos Kozyrakis
|
Ballroom B - Position 43 | |
|
SysNoise: Exploring and Benchmarking Training-Deployment System Inconsistency
Correctness and Security
Yan Wang ⋅ Yuhang Li ⋅ Ruihao Gong ⋅ Aishan Liu ⋅ yanfei wang ⋅ Jian Hu ⋅ Yongqiang Yao ⋅ Yunchen Zhang ⋅ tianzi xiaotian ⋅ Fengwei Yu ⋅ Xianglong Liu
|
Ballroom B - Position 13 | |
|
FLINT: A Platform for Federated Learning Integration
Federated Learning
Ewen Wang ⋅ Boyi Chen ⋅ Mosharaf Chowdhury ⋅ Ajay Kannan ⋅ Franco Liang
|
Ballroom B - Position 27 | |
|
Building Verified Neural Networks for Computer Systems with Ouroboros
Correctness and Security
Cheng Tan ⋅ Changliu Liu ⋅ Zhihao Jia ⋅ Tianhao Wei
|
Ballroom B - Position 15 | |
|
Cupcake: A Compression Scheduler for Scalable Communication-Efficient Distributed Training
Parallel and Distributed Systems 2: Communication
Zhuang Wang ⋅ Xinyu Wu ⋅ Zhaozhuo Xu ⋅ T. S. Eugene Ng
|
Ballroom B - Position 4 | |
|
Cuttlefish: Low-Rank Model Training without All the Tuning
Sparsity 1: Models and Algorithms
Hongyi Wang ⋅ Saurabh Agarwal ⋅ Pongsakorn U-chupala ⋅ Yoshiki Tanaka ⋅ Eric Xing ⋅ Dimitris Papailiopoulos
|
Ballroom B - Position 18 | |
|
Tutel: Adaptive Mixture-of-Experts at Scale
Parallel and Distributed Systems 1: Parallelism
Changho Hwang ⋅ Wei Cui ⋅ Yifan Xiong ⋅ Ziyue Yang ⋅ Ze Liu ⋅ Han Hu ⋅ Zilong Wang ⋅ Rafael Salas ⋅ Jithin Jose ⋅ Prabhat Ram ⋅ HoYuen Chau ⋅ Peng Cheng ⋅ Fan Yang ⋅ Mao Yang ⋅ Yongqiang Xiong
|
Ballroom B - Position 2 | |
|
Be Careful with PyPI Packages: You May Unconsciously Spread Backdoor Model Weights
Correctness and Security
Tianhang Zheng ⋅ Hao Lan ⋅ Baochun Li
|
Ballroom B - Position 14 | |
|
GiPH: Generalizable Placement Learning for Adaptive Heterogeneous Computing
ML for Systems
Yi Hu ⋅ Chaoran Zhang ⋅ Edward Andert ⋅ Harshul Singh ⋅ Aviral Shrivastava ⋅ James Laudon ⋅ Yanqi Zhou ⋅ Bob Iannucci ⋅ Carlee Joe-Wong
|
Ballroom B - Position 31 | |
|
Communication-Efficient Graph Neural Networks with Probabilistic Neighborhood Expansion Analysis and Caching
Parallel and Distributed Systems 2: Communication
Tim Kaler ⋅ Alexandros Iliopoulos ⋅ Philip Murzynowski ⋅ Tao Schardl ⋅ Charles E. Leiserson ⋅ Jie Chen
|
Ballroom B - Position 5 | |
|
SUBGRAPH STATIONARY HARDWARE-SOFTWARE INFERENCE CO-DESIGN
Edge
Payman Behnam ⋅ Alexey Tumanov ⋅ Tushar Krishna ⋅ Pranav Gadikar ⋅ Yangyu Chen ⋅ Jianming Tong ⋅ Yue Pan ⋅ Abhimanyu Rajeshkumar Bambhaniya ⋅ Alind Khare
|
Ballroom B - Position 46 | |
|
GlueFL: Reconciling Client Sampling and Model Masking for Bandwidth Efficient Federated Learning
Federated Learning
Shiqi He ⋅ Qifan Yan ⋅ Feijie Wu ⋅ Lanjun Wang ⋅ Mathias Lécuyer ⋅ Ivan Beschastnikh
|
Ballroom B - Position 29 | |
|
FedTree: A Federated Learning System For Trees
Federated Learning
Qinbin Li ⋅ Zhaomin Wu ⋅ Yanzheng Cai ⋅ yuxuan han ⋅ Ching Man Yung ⋅ Tianyuan Fu ⋅ Bingsheng He
|
Ballroom B - Position 26 | |
|
RevBiFPN: The Fully Reversible Bidirectional Feature Pyramid Network
Memory Optimization
Vitaliy Chiley ⋅ Vithursan Thangarasa ⋅ Abhay Gupta ⋅ Anshul Samar ⋅ Joel Hestness ⋅ Dennis DeCoste
|
Ballroom B - Position 11 | |
|
On Optimizing the Communication of Model Parallelism
Parallel and Distributed Systems 2: Communication
Yonghao Zhuang ⋅ Lianmin Zheng ⋅ Zhuohan Li ⋅ Eric Xing ⋅ Qirong Ho ⋅ Joseph Gonzalez ⋅ Ion Stoica ⋅ Hao Zhang ⋅ Hexu Zhao
|
Ballroom B - Position 7 | |
|
Validating Large Language Models with ReLM
Michael Kuchnik ⋅ Virginia Smith ⋅ George Amvrosiadis
|
Ballroom B - Position 12 | |
|
Efficient GPU Kernels for N:M-Sparse Weights in Deep Learning
Sparsity 2: Systems
Bin Lin ⋅ Ningxin Zheng ⋅ Lei Wang ⋅ Shijie Cao ⋅ Lingxiao Ma ⋅ Quanlu Zhang ⋅ Yi Zhu ⋅ Ting Cao ⋅ Jilong Xue ⋅ Yuqing Yang ⋅ Fan Yang
|
Ballroom B - Position 19 | |
|
Transcending Runtime-Memory Tradeoffs in Checkpointing by being Fusion Aware
Memory Optimization
Horace He ⋅ Shangdi Yu
|
Ballroom B - Position 8 | |
|
Efficiently Scaling Transformer Inference
Reiner Pope ⋅ Sholto Douglas ⋅ Aakanksha Chowdhery ⋅ Jacob Devlin ⋅ James Bradbury ⋅ Jonathan Heek ⋅ Kefan Xiao ⋅ Shivani Agrawal ⋅ Jeff Dean
|
Ballroom B - Position 23 | |
|
Hotline Profiler: Automatic Annotation and A Multi-Scale Timeline for Visualizing Time-Use in DNN Training
Measurement and Analysis
Daniel Snider ⋅ Fanny Chevalier ⋅ Gennady Pekhimenko
|
Ballroom B - Position 24 | |
|
On Noisy Evaluation in Federated Hyperparameter Tuning
Federated Learning
Kevin Kuo ⋅ Pratiksha Thaker ⋅ Mikhail Khodak ⋅ John Nguyen ⋅ Daniel Jiang ⋅ Ameet Talwalkar ⋅ Virginia Smith
|
Ballroom B - Position 28 | |
|
Adaptive Message Quantization and Parallelization for Distributed Full-graph GNN Training
Parallel and Distributed Systems 2: Communication
Borui Wan ⋅ Juntao Zhao ⋅ Chuan Wu
|
Ballroom B - Position 6 | |
|
Learning to Parallelize with OpenMP by Augmented Heterogeneous AST Representation
ML for Systems
Le Chen ⋅ Quazi Ishtiaque Mahmud ⋅ Hung Phan ⋅ Nesreen Ahmed ⋅ Ali Jannesari
|
Ballroom B - Position 32 | |
|
ApproxCaliper: A Programmable Framework for Application-aware Neural Network Optimization
Measurement and Analysis
Yifan Zhao ⋅ Hashim Sharif ⋅ Peter Pao-Huang ⋅ Vatsin Shah ⋅ Arun Narenthiran Sivakumar ⋅ Mateus Valverde Gasparino ⋅ Abdulrahman Mahmoud ⋅ Nathan Zhao ⋅ Sarita Adve ⋅ Girish Chowdhary ⋅ Sasa Misailovic ⋅ Vikram Adve
|
Ballroom B - Position 25 | |
|
XRBench: An Extended Reality (XR) Machine Learning Benchmark Suite for the Metaverse
Emerging Models and Domains
Hyoukjun Kwon ⋅ Krishnakumar Nair ⋅ Jamin Seo ⋅ Jason Yik ⋅ Debabrata Mohapatra ⋅ Dongyuan Zhan ⋅ JINOOK SONG ⋅ Peter Capak ⋅ Peizhao Zhang ⋅ Peter Vajda ⋅ Colby Banbury ⋅ Mark Mazumder ⋅ Liangzhen Lai ⋅ Ashish Sirasao ⋅ Tushar Krishna ⋅ Harshit Khaitan ⋅ Vikas Chandra ⋅ Vijay Janapa Reddi
|
Ballroom B - Position 37 | |
|
AutoScratch: ML-Optimized Cache Management for Inference-Oriented GPUs
ML for Systems
Yaosheng Fu ⋅ Evgeny Bolotin ⋅ Aamer Jaleel ⋅ Gal Dalal ⋅ Shie Mannor ⋅ Jacob Subag ⋅ Noam Korem ⋅ Michael Behar ⋅ David Nellans
|
Ballroom B - Position 30 | |
|
Unified Convolution Framework: A compiler-based approach to support sparse convolutions
Sparsity 2: Systems
Jaeyeon Won ⋅ Changwan Hong ⋅ Charith Mendis ⋅ Joel Emer ⋅ Saman Amarasinghe
|
Ballroom B - Position 20 | |
|
Edge Impulse: An MLOps Platform for Tiny Machine Learning
Edge
colby banbury ⋅ Vijay Janapa Reddi ⋅ Alexander Elium ⋅ Shawn Hymel ⋅ David Tischler ⋅ Daniel Situnayake ⋅ Carl Ward ⋅ Louis Moreau ⋅ Jenny Plunkett ⋅ Matthew Kelcey ⋅ Mathijs Baaijens ⋅ Alessandro Grande ⋅ Dmitry Maslov ⋅ Arthur Beavis ⋅ Jan Jongboom ⋅ Jessica Quaye
|
Ballroom B - Position 45 | |
|
Safe Optimized Static Memory Allocation for Parallel Deep Learning
Memory Optimization
Ioannis Lamprou ⋅ Zhen Zhang ⋅ Javier de Juan ⋅ Hang Yang ⋅ Yongqiang Lai ⋅ Etienne Filhol ⋅ Cedric Bastoul
|
Ballroom B - Position 9 | |
|
Renee: END-TO-END TRAINING OF EXTREME CLASSIFICATION MODELS
Emerging Models and Domains
Vidit Jain ⋅ Jatin Prakash ⋅ Deepak Saini ⋅ Jian Jiao ⋅ Ramachandran Ramjee ⋅ Manik Varma
|
Ballroom B - Position 38 | |
|
Reducing Activation Recomputation in Large Transformer Models
Memory Optimization
Vijay Anand Korthikanti ⋅ Jared Casper ⋅ Sangkug Lym ⋅ Lawrence McAfee ⋅ Michael Andersch ⋅ Mohammad Shoeybi ⋅ Bryan Catanzaro
|
Ballroom B - Position 10 | |
|
Sparsity-Aware Memory Interface Architecture using Stacked XORNet Compression for Accelerating Pruned-DNN Models
Sparsity 2: Systems
Younghoon Byun ⋅ Seungsik Moon ⋅ Baeseong Park ⋅ Se Jung Kwon ⋅ Dongsoo Lee ⋅ Gunho Park ⋅ Eunji Yoo ⋅ Jung Gyu Min ⋅ Youngjoo Lee
|
Ballroom B - Position 22 | |
|
ALCOP: Automatic Load-Compute Pipelining in Deep Learning Compiler for AI-GPUs
Compilers
Guyue Huang ⋅ Yang Bai ⋅ Liu Liu ⋅ Yuke Wang ⋅ Bei Yu ⋅ Yufei Ding ⋅ Yuan Xie
|
Ballroom B - Position 34 | |
|
Practical Edge Kernels for Integer-Only Vision Transformers Under Post-training Quantization
Edge
Zining Zhang ⋅ Bingsheng He ⋅ Zhenjie Zhang
|
Ballroom B - Position 44 | |
|
PipeFisher: Efficient Training of Large Language Models Using Pipelining and Fisher Information Matrices
Parallel and Distributed Systems 1: Parallelism
Kazuki Osawa ⋅ Shigang Li ⋅ Torsten Hoefler
|
Ballroom B - Position 1 | |
|
SIRIUS: Harvesting Whole-Program Optimization Opportunities for DNNs
Compilers
YIJIN LI ⋅ Jiacheng Zhao ⋅ Sun Qianqi ⋅ Haohui Mai ⋅ Lei Chen ⋅ Wanlu Cao ⋅ Yanfan Chen ⋅ Li zhicheng ⋅ YING LIU ⋅ Xinyuan Zhang ⋅ Xiyu Shi ⋅ Jie Zhao ⋅ Jingling Xue ⋅ HUIMIN CUI ⋅ XiaoBing Feng
|
Ballroom B - Position 35 | |
|
X-RLFLOW: GRAPH REINFORCEMENT LEARNING FOR NEURAL NETWORK SUBGRAPHS TRANSFORMATION
Compilers
Guoliang HE ⋅ Sean Parker ⋅ Eiko Yoneki
|
Ballroom B - Position 36 | |
|
μ-TWO: 3× Faster Multi-Model Training with Orchestration and Memory Optimization
Storage, Scheduling, and Networking
Sanket Purandare ⋅ Abdul Wasay ⋅ Stratos Idreos ⋅ Animesh Jain
|
Ballroom B - Position 41 | |
|
PyTorch RPC: Distributed Deep Learning Built on Tensor-Optimized Remote Procedure Calls
Storage, Scheduling, and Networking
Pritam Damania ⋅ Shen Li ⋅ Alban Desmaison ⋅ Alisson Azzolini ⋅ Brian Vaughan ⋅ Edward Yang ⋅ Gregory Chanan ⋅ Guoqiang Jerry Chen ⋅ Hongyi Jia ⋅ Howard Huang ⋅ Joseph Spisak ⋅ Luca Wehrstedt ⋅ Lucas Hosseini ⋅ Manoj Krishnan ⋅ Omkar Salpekar ⋅ Pavel Belevich ⋅ Rohan Varma ⋅ Satendra Gera ⋅ Wanchao Liang ⋅ Shihao Xu ⋅ Soumith Chintala ⋅ Chaoyang He ⋅ Amir Ziashahabi ⋅ Salman Avestimehr ⋅ ⋅ Zachary DeVito
|
Ballroom B - Position 42 | |
|
Pre-train and Search: Efficient Embedding Table Sharding with Pre-trained Neural Cost Models
Storage, Scheduling, and Networking
Daochen Zha ⋅ Louis Feng ⋅ Liang Luo ⋅ Bhargav Bhushanam ⋅ Zirui Liu ⋅ Yusuo Hu ⋅ Jade Nie ⋅ Yuzhen Huang ⋅ Yuandong Tian ⋅ Arun Kejariwal ⋅ Xia Hu
|
Ballroom B - Position 40 | |
|
MegaBlocks: Efficient Sparse Training with Mixture-of-Experts
Sparsity 1: Models and Algorithms
Trevor Gale ⋅ Deepak Narayanan ⋅ Cliff Young ⋅ Matei Zaharia
|
Ballroom B - Position 16 |
Successful Page Load