MLSys 2024 List of Accepted Papers
HeteroSwitch: Characterizing and Taming System-Induced Data Heterogeneity in Federated Learning
Federated Learning
Gyudong Kim · Mehdi Ghasemi · Soroush Heidari · Seungryong Kim · Young Geun Kim · Sarma Vrudhula · Carole-Jean Wu
|
Poster Position Number | |
SLoRA: Scalable Serving of Thousands of LoRA Adapters
Large Language Models 1
Ying Sheng · Shiyi Cao · Dacheng Li · Coleman Hooper · Nicholas Lee · Shuo Yang · Christopher Chou · Banghua Zhu · Lianmin Zheng · Kurt Keutzer · Joseph Gonzalez · Ion Stoica
|
Poster Position Number | |
Q-Hitter: A Better Token Oracle for Efficient LLM Inference via Sparse-Quantized KV Cache
Large Language Models 1
Zhenyu Zhang · Shiwei Liu · Runjin Chen · Bhavya Kailkhura · Beidi Chen · Atlas Wang
|
Poster Position Number | |
Efficient Post-training Quantization with FP8 Formats
Quantization and Compression 2
Haihao Shen · Naveen Mellempudi · Xin He · Qun Gao · Chang Wang · Mengni Wang
|
Poster Position Number | |
DiffusionPipe: Training Large Diffusion Models with Efficient Pipelines
Parallel and Distributed 1
Ye Tian · Zhen Jia · Ziyue Luo · Yida Wang · Chuan Wu
|
Poster Position Number | |
CloudEval-YAML: A Practical Benchmark for Cloud Configuration Generation
Measurement and Analysis
Yifei Xu · Yuning Chen · Xumiao Zhang · Xianshang Lin · Pan Hu · Yunfei Ma · Songwu Lu · Wan Du · Zhuoqing Mao · Ennan Zhai · Dennis Cai
|
Poster Position Number | |
ACROBAT: Optimizing Auto-batching of Dynamic Deep Learning at Compile Time
Performance and Memory
Pratik Fegade · Tianqi Chen · Phillip Gibbons · Todd Mowry
|
Poster Position Number | |
vMCU: Coordinated Memory Management and Kernel Optimization for DNN Inference on MCUs
Performance and Memory
Size Zheng · Renze Chen · Meng Li · Zihao Ye · Luis Ceze · Yun Liang
|
Poster Position Number | |
VQPy: An Object-Oriented Approach to Modern Video Analytics
ML for Systems
Shan Yu · Zhenting Zhu · Yu Chen · Hanchen Xu · Pengzhan Zhao · Yang Wang · Arthi Padmanabhan · Hugo Latapie · Harry Xu
|
Poster Position Number | |
Keyformer: KV Cache reduction through key tokens selection for Efficient Generative Inference
LLM 2
Muhammad Adnan · Akhil Arunkumar · Gaurav Jain · Prashant Nair · Ilya Soloveychik · Purushotham Kamath
|
Poster Position Number | |
Prompt Cache: Modular Attention Reuse for Low-Latency Inference
LLM 2
In Gim · Guojun Chen · Seung-seob Lee · Nikhil Sarda · Anurag Khandelwal · Lin Zhong
|
Poster Position Number | |
Schrodinger's FP Training Neural Networks with Dynamic Floating-Point Containers
Quantization and Compression 2
Milos Nikolic · Enrique Torres Sanchez · Jiahui Wang · Ali Hadi Zadeh · Mostafa Mahmoud · Ameer Abdelhadi · Kareem Ibrahim · Andreas Moshovos
|
Poster Position Number | |
Lancet: Accelerating Mixture-of-Experts Training by Overlapping Weight Gradient Computation and All-to-All Communication
Parallel and Distributed 2
Chenyu Jiang · Ye Tian · Zhen Jia · Chuan Wu · Yida Wang · Shuai Zheng
|
Poster Position Number | |
Proteus: Preserving Model Confidentiality during Graph Optimizations
Privacy and security
Yubo Gao · Maryam Haghifam · Christina Giannoula · Renbo Tu · Gennady Pekhimenko · Nandita Vijaykumar
|
Poster Position Number | |
QMoE: Sub-1-Bit Compression of Trillion Parameter Models
Quantization and Compression 1
Elias Frantar · Dan Alistarh
|
Poster Position Number | |
HeteGen: Efficient Heterogeneous Parallel Inference for Large Language Models on Resource-Constrained Devices
Parallel and Distributed 2
ZHAO XUANLEI · Bin Jia · Haotian Zhou · Ziming Liu · Shenggan Cheng · Yang You
|
Poster Position Number | |
Distributed Matrix-Based Sampling for Graph Neural Network Training
Parallel and Distributed 1
Alok Tripathy · Katherine Yelick · Aydin Buluc
|
Poster Position Number | |
FedTrans: Efficient Federated Learning via Multi-Model Transformation
Federated Learning
Yuxuan Zhu · Jiachen Liu · Mosharaf Chowdhury · Fan Lai
|
Poster Position Number | |
COMET: Neural Cost Model Explanation Framework
Measurement and Analysis
Isha Chaudhary · Alex Renda · Charith Mendis · Gagandeep Singh
|
Poster Position Number | |
Atom: Low-Bit Quantization for Efficient and Accurate LLM Serving
Quantization and Compression 1
Yilong Zhao · Chien-Yu Lin · Kan Zhu · Zihao Ye · Lequn Chen · Size Zheng · Luis Ceze · Arvind Krishnamurthy · Tianqi Chen · Baris Kasikci
|
Poster Position Number | |
FLASH: Fast Model Adaptation in ML-Centric Cloud Platforms
ML for Systems
Haoran Qiu · Weichao Mao · Archit Patke · Shengkun Cui · Chen Wang · Hubertus Franke · Zbigniew Kalbarczyk · Tamer Basar · Ravi Iyer
|
Poster Position Number | |
Torch2Chip: An End-to-end Customizable Deep Neural Network Compression and Deployment Toolkit for Prototype Hardware Accelerator Design
Quantization and Compression 2
Jian Meng · Yuan Liao · Anupreetham Anupreetham · Ahmed Hasssan · Shixing Yu · Han-sok Suh · Xiaofeng Hu · Jae-sun Seo
|
Poster Position Number | |
LIFL: A Lightweight, Event-driven Serverless Platform for Federated Learning
Federated Learning
Shixiong Qi · K. K. Ramakrishnan · Myungjin Lee
|
Poster Position Number | |
Does Compressing Activations Help Model Parallel Training?
Measurement and Analysis
Song Bian · Dacheng Li · Hongyi Wang · Eric Xing · Shivaram Venkataraman
|
Poster Position Number | |
Accelerating ReLU for MPC-Based Private Inference with a Communication-Efficient Sign Estimation
Privacy and security
Kiwan Maeng · G. Edward Suh
|
Poster Position Number | |
ACCURATE LOW-DEGREE POLYNOMIAL APPROXIMATION OF NON-POLYNOMIAL OPERATORS FOR FAST PRIVATE INFERENCE IN HOMOMORPHIC ENCRYPTION
Privacy and security
Jingtian Dang · Jianming Tong · Anupam Golder · Cong "Callie" Hao · Arijit Raychowdhury · Tushar Krishna
|
Poster Position Number | |
VIDUR: A LARGE-SCALE SIMULATION FRAMEWORK FOR LLM INFERENCE
Measurement and Analysis
Amey Agrawal · Nitin Kedia · Jayashree Mohan · Ashish Panwar · Nipun Kwatra · Bhargav Gulavani · Ramachandran Ramjee · Alexey Tumanov
|
Poster Position Number | |
FlashDecoding++: Faster Large Language Model Inference with Asynchronization, Flat GEMM Optimization, and Heuristics
LLM 2
Ke Hong · Guohao Dai · Jiaming Xu · Qiuli Mao · Xiuhong Li · Jun Liu · kangdi chen · Yuhan Dong · Yu Wang
|
Poster Position Number | |
Punica: Multi-Tenant LoRA Serving
Large Language Models 1
Lequn Chen · Zihao Ye · Yongji Wu · Danyang Zhuo · Luis Ceze · Arvind Krishnamurthy
|
Poster Position Number | |
SiDA: Sparsity-Inspired Data-Aware Serving for Efficient and Scalable Large Mixture-of-Experts Models
Performance and Memory
Zhixu Du · Shiyu Li · Yuhao Wu · Xiangyu Jiang · Jingwei Sun · Qilin Zheng · Yongkai Wu · Ang Li · Hai Li · Yiran Chen
|
Poster Position Number | |
On Latency Predictors for Neural Architecture Search
ML for Systems
Yash Akhauri · Mohamed Abdelfattah
|
Poster Position Number | |
AWQ: Activation-aware Weight Quantization for On-Device LLM Compression and Acceleration
Quantization and Compression 1
Ji Lin · Jiaming Tang · Haotian Tang · Shang Yang · Wei-Ming Chen · Wei-Chen Wang · Guangxuan Xiao · Xingyu Dang · Chuang Gan · Song Han
|
Poster Position Number | |
Disaggregated Multi-Tower: Topology-aware Modeling Technique for Efficient Large Scale Recommendation
Parallel and Distributed 2
Liang Luo · Buyun Zhang · Michael Tsang · Yinbin Ma · Ching-Hsiang Chu · Yuxin Chen · Shen Li · Yuchen Hao · Yanli Zhao · Guna Lakshminarayanan · Ellie Wen · Jongsoo Park · Dheevatsa Mudigere · Maxim Naumov
|
Poster Position Number | |
JIT-Q: Just-in-time Quantization with Processing-In-Memory for Efficient ML Training
Quantization and Compression 2
Mohamed Ibrahim · Shaizeen Aga · Ada Li · Suchita Pati · Mahzabeen Islam
|
Poster Position Number | |
Fine-Tuning Language Models Using Formal Methods Feedback: A Use Case in Autonomous Systems
Large Language Models 1
Yunhao Yang · Neel P. Bhatt · Tyler Ingebrand · William Ward · Steven Carr · Atlas Wang · Ufuk Topcu
|
Poster Position Number | |
UniDM: A Unified Framework for Data Manipulation with Large Language Models
ML for Systems
Yichen Qian · Yongyi He · Rong Zhu · Jintao Huang · Zhijian Ma · Haibin Wang · Yaohua Wang · Xiuyu Sun · Defu Lian · Bolin Ding · Jingren Zhou
|
Poster Position Number | |
L-GreCo: Layerwise-adaptive Gradient Compression For Efficient Data-parallel Deep Learning
Parallel and Distributed 1
Ilia Markov · Kaveh Alim · Elias Frantar · Dan Alistarh
|
Poster Position Number |