Skip to yearly menu bar Skip to main content


MLSys 2024 List of Accepted Papers

Prompt Cache: Modular Attention Reuse for Low-Latency Inference LLM 2
In Gim ⋅ Guojun Chen ⋅ Seung-seob Lee ⋅ Nikhil Sarda ⋅ Anurag Khandelwal ⋅ Lin Zhong
Poster Position Number 25
Efficient Post-training Quantization with FP8 Formats Quantization and Compression 2
Haihao Shen ⋅ Naveen Mellempudi ⋅ Xin He ⋅ Qun Gao ⋅ Chang Wang ⋅ Mengni Wang
Poster Position Number 28
SiDA: Sparsity-Inspired Data-Aware Serving for Efficient and Scalable Large Mixture-of-Experts Models Performance and Memory
Zhixu Du ⋅ Shiyu Li ⋅ Yuhao Wu ⋅ Xiangyu Jiang ⋅ Jingwei Sun ⋅ Qilin Zheng ⋅ Yongkai Wu ⋅ Ang Li ⋅ Hai Li ⋅ Yiran Chen
Poster Position Number 30
LIFL: A Lightweight, Event-driven Serverless Platform for Federated Learning Federated Learning
Shixiong Qi ⋅ K. K. Ramakrishnan ⋅ Myungjin Lee
Poster Position Number 3
ACROBAT: Optimizing Auto-batching of Dynamic Deep Learning at Compile Time Performance and Memory
Pratik Fegade ⋅ Tianqi Chen ⋅ Phillip Gibbons ⋅ Todd Mowry
Poster Position Number 6
Lancet: Accelerating Mixture-of-Experts Training by Overlapping Weight Gradient Computation and All-to-All Communication Parallel and Distributed 2
Chenyu Jiang ⋅ Ye Tian ⋅ Zhen Jia ⋅ Chuan Wu ⋅ Yida Wang ⋅ Shuai Zheng
Poster Position Number 19
Does Compressing Activations Help Model Parallel Training? Measurement and Analysis
Song Bian ⋅ Dacheng Li ⋅ Hongyi Wang ⋅ Eric Xing ⋅ Shivaram Venkataraman
Poster Position Number 35
FlashDecoding++: Faster Large Language Model Inference with Asynchronization, Flat GEMM Optimization, and Heuristics LLM 2
Ke Hong ⋅ Guohao Dai ⋅ Jiaming Xu ⋅ Qiuli Mao ⋅ Xiuhong Li ⋅ Jun Liu ⋅ kangdi chen ⋅ Yuhan Dong ⋅ Yu Wang
Poster Position Number 33
VIDUR: A LARGE-SCALE SIMULATION FRAMEWORK FOR LLM INFERENCE Measurement and Analysis
Amey Agrawal ⋅ Nitin Kedia ⋅ Jayashree Mohan ⋅ Ashish Panwar ⋅ Nipun Kwatra ⋅ Bhargav Gulavani ⋅ Ramachandran Ramjee ⋅ Alexey Tumanov
Poster Position Number 1
CloudEval-YAML: A Practical Benchmark for Cloud Configuration Generation Measurement and Analysis
Yifei Xu ⋅ Yuning Chen ⋅ Xumiao Zhang ⋅ Xianshang Lin ⋅ Pan Hu ⋅ Yunfei Ma ⋅ Songwu Lu ⋅ Wan Du ⋅ Zhuoqing Mao ⋅ Ennan Zhai ⋅ Dennis Cai
Poster Position Number 2
FLASH: Fast Model Adaptation in ML-Centric Cloud Platforms ML for Systems
Haoran Qiu ⋅ Weichao Mao ⋅ Archit Patke ⋅ Shengkun Cui ⋅ Chen Wang ⋅ Hubertus Franke ⋅ Zbigniew Kalbarczyk ⋅ Tamer Basar ⋅ Ravi Iyer
Poster Position Number 4
UniDM: A Unified Framework for Data Manipulation with Large Language Models ML for Systems
Yichen Qian ⋅ Yongyi He ⋅ Rong Zhu ⋅ Jintao Huang ⋅ Zhijian Ma ⋅ Haibin Wang ⋅ Yaohua Wang ⋅ Xiuyu Sun ⋅ Defu Lian ⋅ Bolin Ding ⋅ Jingren Zhou
Poster Position Number 5
FedTrans: Efficient Federated Learning via Multi-Model Transformation Federated Learning
Yuxuan Zhu ⋅ Jiachen Liu ⋅ Mosharaf Chowdhury ⋅ Fan Lai
Poster Position Number 7
SLoRA: Scalable Serving of Thousands of LoRA Adapters Large Language Models 1
Ying Sheng ⋅ Shiyi Cao ⋅ Dacheng Li ⋅ Coleman Hooper ⋅ Nicholas Lee ⋅ Shuo Yang ⋅ Christopher Chou ⋅ Banghua Zhu ⋅ Lianmin Zheng ⋅ Kurt Keutzer ⋅ Joseph Gonzalez ⋅ Ion Stoica
Poster Position Number 9
JIT-Q: Just-in-time Quantization with Processing-In-Memory for Efficient ML Training Quantization and Compression 2
Mohamed Ibrahim ⋅ Shaizeen Aga ⋅ Ada Li ⋅ Suchita Pati ⋅ Mahzabeen Islam
Poster Position Number 8
HeteroSwitch: Characterizing and Taming System-Induced Data Heterogeneity in Federated Learning Federated Learning
Gyudong Kim ⋅ Mehdi Ghasemi ⋅ Soroush Heidari ⋅ Seungryong Kim ⋅ Young Geun Kim ⋅ Sarma Vrudhula ⋅ Carole-Jean Wu
Poster Position Number 10
Fine-Tuning Language Models Using Formal Methods Feedback: A Use Case in Autonomous Systems Large Language Models 1
Yunhao Yang ⋅ Neel P. Bhatt ⋅ Tyler Ingebrand ⋅ William Ward ⋅ Steven Carr ⋅ Atlas Wang ⋅ Ufuk Topcu
Poster Position Number 11
Distributed Matrix-Based Sampling for Graph Neural Network Training Parallel and Distributed 1
Alok Tripathy ⋅ Katherine Yelick ⋅ Aydin Buluc
Poster Position Number 12
Atom: Low-Bit Quantization for Efficient and Accurate LLM Serving Quantization and Compression 1
Yilong Zhao ⋅ Chien-Yu Lin ⋅ Kan Zhu ⋅ Zihao Ye ⋅ Lequn Chen ⋅ Size Zheng ⋅ Luis Ceze ⋅ Arvind Krishnamurthy ⋅ Tianqi Chen ⋅ Baris Kasikci
Poster Position Number 13
vMCU: Coordinated Memory Management and Kernel Optimization for DNN Inference on MCUs Performance and Memory
Size Zheng ⋅ Renze Chen ⋅ Meng Li ⋅ Zihao Ye ⋅ Luis Ceze ⋅ Yun Liang
Poster Position Number 14
AWQ: Activation-aware Weight Quantization for On-Device LLM Compression and Acceleration Quantization and Compression 1
Ji Lin ⋅ Jiaming Tang ⋅ Haotian Tang ⋅ Shang Yang ⋅ Wei-Ming Chen ⋅ Wei-Chen Wang ⋅ Guangxuan Xiao ⋅ Xingyu Dang ⋅ Chuang Gan ⋅ Song Han
Poster Position Number 15
VQPy: An Object-Oriented Approach to Modern Video Analytics ML for Systems
Shan Yu ⋅ Zhenting Zhu ⋅ Yu Chen ⋅ Hanchen Xu ⋅ Pengzhan Zhao ⋅ Yang Wang ⋅ Arthi Padmanabhan ⋅ Hugo Latapie ⋅ Harry Xu
Poster Position Number 16
COMET: Neural Cost Model Explanation Framework Measurement and Analysis
Isha Chaudhary ⋅ Alex Renda ⋅ Charith Mendis ⋅ Gagandeep Singh
Poster Position Number 17
Schrodinger's FP Training Neural Networks with Dynamic Floating-Point Containers Quantization and Compression 2
Milos Nikolic ⋅ Enrique Torres Sanchez ⋅ Jiahui Wang ⋅ Ali Hadi Zadeh ⋅ Mostafa Mahmoud ⋅ Ameer Abdelhadi ⋅ Kareem Ibrahim ⋅ Andreas Moshovos
Poster Position Number 18
Accelerating ReLU for MPC-Based Private Inference with a Communication-Efficient Sign Estimation Privacy and security
Kiwan Maeng ⋅ G. Edward Suh
Poster Position Number 20
L-GreCo: Layerwise-adaptive Gradient Compression For Efficient Data-parallel Deep Learning Parallel and Distributed 1
Ilia Markov ⋅ Kaveh Alim ⋅ Elias Frantar ⋅ Dan Alistarh
Poster Position Number 21
Keyformer: KV Cache reduction through key tokens selection for Efficient Generative Inference LLM 2
Muhammad Adnan ⋅ Akhil Arunkumar ⋅ Gaurav Jain ⋅ Prashant Nair ⋅ Ilya Soloveychik ⋅ Purushotham Kamath
Poster Position Number 22
Torch2Chip: An End-to-end Customizable Deep Neural Network Compression and Deployment Toolkit for Prototype Hardware Accelerator Design Quantization and Compression 2
Jian Meng ⋅ Yuan Liao ⋅ Anupreetham Anupreetham ⋅ Ahmed Hasssan ⋅ Shixing Yu ⋅ Han-sok Suh ⋅ Xiaofeng Hu ⋅ Jae-sun Seo
Poster Position Number 23
ACCURATE LOW-DEGREE POLYNOMIAL APPROXIMATION OF NON-POLYNOMIAL OPERATORS FOR FAST PRIVATE INFERENCE IN HOMOMORPHIC ENCRYPTION Privacy and security
Jingtian Dang ⋅ Jianming Tong ⋅ Anupam Golder ⋅ Cong "Callie" Hao ⋅ Arijit Raychowdhury ⋅ Tushar Krishna
Poster Position Number 24
Q-Hitter: A Better Token Oracle for Efficient LLM Inference via Sparse-Quantized KV Cache Large Language Models 1
Zhenyu Zhang ⋅ Shiwei Liu ⋅ Runjin Chen ⋅ Bhavya Kailkhura ⋅ Beidi Chen ⋅ Atlas Wang
Poster Position Number 26
Proteus: Preserving Model Confidentiality during Graph Optimizations Privacy and security
Yubo Gao ⋅ Maryam Haghifam ⋅ Christina Giannoula ⋅ Renbo Tu ⋅ Gennady Pekhimenko ⋅ Nandita Vijaykumar
Poster Position Number 27
On Latency Predictors for Neural Architecture Search ML for Systems
Yash Akhauri ⋅ Mohamed Abdelfattah
Poster Position Number 29
QMoE: Sub-1-Bit Compression of Trillion Parameter Models Quantization and Compression 1
Elias Frantar ⋅ Dan Alistarh
Poster Position Number 31
DiffusionPipe: Training Large Diffusion Models with Efficient Pipelines Parallel and Distributed 1
Ye Tian ⋅ Zhen Jia ⋅ Ziyue Luo ⋅ Yida Wang ⋅ Chuan Wu
Poster Position Number 32
Punica: Multi-Tenant LoRA Serving Large Language Models 1
Lequn Chen ⋅ Zihao Ye ⋅ Yongji Wu ⋅ Danyang Zhuo ⋅ Luis Ceze ⋅ Arvind Krishnamurthy
Poster Position Number 34
Disaggregated Multi-Tower: Topology-aware Modeling Technique for Efficient Large Scale Recommendation Parallel and Distributed 2
Liang Luo ⋅ Buyun Zhang ⋅ Michael Tsang ⋅ Yinbin Ma ⋅ Ching-Hsiang Chu ⋅ Yuxin Chen ⋅ Shen Li ⋅ Yuchen Hao ⋅ Yanli Zhao ⋅ Guna Lakshminarayanan ⋅ Ellie Wen ⋅ Jongsoo Park ⋅ Dheevatsa Mudigere ⋅ Maxim Naumov
Poster Position Number 36
HeteGen: Efficient Heterogeneous Parallel Inference for Large Language Models on Resource-Constrained Devices Parallel and Distributed 2
ZHAO XUANLEI ⋅ Bin Jia ⋅ Haotian Zhou ⋅ Ziming Liu ⋅ Shenggan Cheng ⋅ Yang You
Poster Position Number 37