MLSys 2024 List of Accepted Papers
|
Prompt Cache: Modular Attention Reuse for Low-Latency Inference
LLM 2
In Gim ⋅ Guojun Chen ⋅ Seung-seob Lee ⋅ Nikhil Sarda ⋅ Anurag Khandelwal ⋅ Lin Zhong
|
Poster Position Number 25 | |
|
Efficient Post-training Quantization with FP8 Formats
Quantization and Compression 2
Haihao Shen ⋅ Naveen Mellempudi ⋅ Xin He ⋅ Qun Gao ⋅ Chang Wang ⋅ Mengni Wang
|
Poster Position Number 28 | |
|
SiDA: Sparsity-Inspired Data-Aware Serving for Efficient and Scalable Large Mixture-of-Experts Models
Performance and Memory
Zhixu Du ⋅ Shiyu Li ⋅ Yuhao Wu ⋅ Xiangyu Jiang ⋅ Jingwei Sun ⋅ Qilin Zheng ⋅ Yongkai Wu ⋅ Ang Li ⋅ Hai Li ⋅ Yiran Chen
|
Poster Position Number 30 | |
|
LIFL: A Lightweight, Event-driven Serverless Platform for Federated Learning
Federated Learning
Shixiong Qi ⋅ K. K. Ramakrishnan ⋅ Myungjin Lee
|
Poster Position Number 3 | |
|
ACROBAT: Optimizing Auto-batching of Dynamic Deep Learning at Compile Time
Performance and Memory
Pratik Fegade ⋅ Tianqi Chen ⋅ Phillip Gibbons ⋅ Todd Mowry
|
Poster Position Number 6 | |
|
Lancet: Accelerating Mixture-of-Experts Training by Overlapping Weight Gradient Computation and All-to-All Communication
Parallel and Distributed 2
Chenyu Jiang ⋅ Ye Tian ⋅ Zhen Jia ⋅ Chuan Wu ⋅ Yida Wang ⋅ Shuai Zheng
|
Poster Position Number 19 | |
|
Does Compressing Activations Help Model Parallel Training?
Measurement and Analysis
Song Bian ⋅ Dacheng Li ⋅ Hongyi Wang ⋅ Eric Xing ⋅ Shivaram Venkataraman
|
Poster Position Number 35 | |
|
FlashDecoding++: Faster Large Language Model Inference with Asynchronization, Flat GEMM Optimization, and Heuristics
LLM 2
Ke Hong ⋅ Guohao Dai ⋅ Jiaming Xu ⋅ Qiuli Mao ⋅ Xiuhong Li ⋅ Jun Liu ⋅ kangdi chen ⋅ Yuhan Dong ⋅ Yu Wang
|
Poster Position Number 33 | |
|
VIDUR: A LARGE-SCALE SIMULATION FRAMEWORK FOR LLM INFERENCE
Measurement and Analysis
Amey Agrawal ⋅ Nitin Kedia ⋅ Jayashree Mohan ⋅ Ashish Panwar ⋅ Nipun Kwatra ⋅ Bhargav Gulavani ⋅ Ramachandran Ramjee ⋅ Alexey Tumanov
|
Poster Position Number 1 | |
|
CloudEval-YAML: A Practical Benchmark for Cloud Configuration Generation
Measurement and Analysis
Yifei Xu ⋅ Yuning Chen ⋅ Xumiao Zhang ⋅ Xianshang Lin ⋅ Pan Hu ⋅ Yunfei Ma ⋅ Songwu Lu ⋅ Wan Du ⋅ Zhuoqing Mao ⋅ Ennan Zhai ⋅ Dennis Cai
|
Poster Position Number 2 | |
|
FLASH: Fast Model Adaptation in ML-Centric Cloud Platforms
ML for Systems
Haoran Qiu ⋅ Weichao Mao ⋅ Archit Patke ⋅ Shengkun Cui ⋅ Chen Wang ⋅ Hubertus Franke ⋅ Zbigniew Kalbarczyk ⋅ Tamer Basar ⋅ Ravi Iyer
|
Poster Position Number 4 | |
|
UniDM: A Unified Framework for Data Manipulation with Large Language Models
ML for Systems
Yichen Qian ⋅ Yongyi He ⋅ Rong Zhu ⋅ Jintao Huang ⋅ Zhijian Ma ⋅ Haibin Wang ⋅ Yaohua Wang ⋅ Xiuyu Sun ⋅ Defu Lian ⋅ Bolin Ding ⋅ Jingren Zhou
|
Poster Position Number 5 | |
|
FedTrans: Efficient Federated Learning via Multi-Model Transformation
Federated Learning
Yuxuan Zhu ⋅ Jiachen Liu ⋅ Mosharaf Chowdhury ⋅ Fan Lai
|
Poster Position Number 7 | |
|
SLoRA: Scalable Serving of Thousands of LoRA Adapters
Large Language Models 1
Ying Sheng ⋅ Shiyi Cao ⋅ Dacheng Li ⋅ Coleman Hooper ⋅ Nicholas Lee ⋅ Shuo Yang ⋅ Christopher Chou ⋅ Banghua Zhu ⋅ Lianmin Zheng ⋅ Kurt Keutzer ⋅ Joseph Gonzalez ⋅ Ion Stoica
|
Poster Position Number 9 | |
|
JIT-Q: Just-in-time Quantization with Processing-In-Memory for Efficient ML Training
Quantization and Compression 2
Mohamed Ibrahim ⋅ Shaizeen Aga ⋅ Ada Li ⋅ Suchita Pati ⋅ Mahzabeen Islam
|
Poster Position Number 8 | |
|
HeteroSwitch: Characterizing and Taming System-Induced Data Heterogeneity in Federated Learning
Federated Learning
Gyudong Kim ⋅ Mehdi Ghasemi ⋅ Soroush Heidari ⋅ Seungryong Kim ⋅ Young Geun Kim ⋅ Sarma Vrudhula ⋅ Carole-Jean Wu
|
Poster Position Number 10 | |
|
Fine-Tuning Language Models Using Formal Methods Feedback: A Use Case in Autonomous Systems
Large Language Models 1
Yunhao Yang ⋅ Neel P. Bhatt ⋅ Tyler Ingebrand ⋅ William Ward ⋅ Steven Carr ⋅ Atlas Wang ⋅ Ufuk Topcu
|
Poster Position Number 11 | |
|
Distributed Matrix-Based Sampling for Graph Neural Network Training
Parallel and Distributed 1
Alok Tripathy ⋅ Katherine Yelick ⋅ Aydin Buluc
|
Poster Position Number 12 | |
|
Atom: Low-Bit Quantization for Efficient and Accurate LLM Serving
Quantization and Compression 1
Yilong Zhao ⋅ Chien-Yu Lin ⋅ Kan Zhu ⋅ Zihao Ye ⋅ Lequn Chen ⋅ Size Zheng ⋅ Luis Ceze ⋅ Arvind Krishnamurthy ⋅ Tianqi Chen ⋅ Baris Kasikci
|
Poster Position Number 13 | |
|
vMCU: Coordinated Memory Management and Kernel Optimization for DNN Inference on MCUs
Performance and Memory
Size Zheng ⋅ Renze Chen ⋅ Meng Li ⋅ Zihao Ye ⋅ Luis Ceze ⋅ Yun Liang
|
Poster Position Number 14 | |
|
AWQ: Activation-aware Weight Quantization for On-Device LLM Compression and Acceleration
Ji Lin ⋅ Jiaming Tang ⋅ Haotian Tang ⋅ Shang Yang ⋅ Wei-Ming Chen ⋅ Wei-Chen Wang ⋅ Guangxuan Xiao ⋅ Xingyu Dang ⋅ Chuang Gan ⋅ Song Han
|
Poster Position Number 15 | |
|
VQPy: An Object-Oriented Approach to Modern Video Analytics
ML for Systems
Shan Yu ⋅ Zhenting Zhu ⋅ Yu Chen ⋅ Hanchen Xu ⋅ Pengzhan Zhao ⋅ Yang Wang ⋅ Arthi Padmanabhan ⋅ Hugo Latapie ⋅ Harry Xu
|
Poster Position Number 16 | |
|
COMET: Neural Cost Model Explanation Framework
Measurement and Analysis
Isha Chaudhary ⋅ Alex Renda ⋅ Charith Mendis ⋅ Gagandeep Singh
|
Poster Position Number 17 | |
|
Schrodinger's FP Training Neural Networks with Dynamic Floating-Point Containers
Quantization and Compression 2
Milos Nikolic ⋅ Enrique Torres Sanchez ⋅ Jiahui Wang ⋅ Ali Hadi Zadeh ⋅ Mostafa Mahmoud ⋅ Ameer Abdelhadi ⋅ Kareem Ibrahim ⋅ Andreas Moshovos
|
Poster Position Number 18 | |
|
Accelerating ReLU for MPC-Based Private Inference with a Communication-Efficient Sign Estimation
Privacy and security
Kiwan Maeng ⋅ G. Edward Suh
|
Poster Position Number 20 | |
|
L-GreCo: Layerwise-adaptive Gradient Compression For Efficient Data-parallel Deep Learning
Parallel and Distributed 1
Ilia Markov ⋅ Kaveh Alim ⋅ Elias Frantar ⋅ Dan Alistarh
|
Poster Position Number 21 | |
|
Keyformer: KV Cache reduction through key tokens selection for Efficient Generative Inference
LLM 2
Muhammad Adnan ⋅ Akhil Arunkumar ⋅ Gaurav Jain ⋅ Prashant Nair ⋅ Ilya Soloveychik ⋅ Purushotham Kamath
|
Poster Position Number 22 | |
|
Torch2Chip: An End-to-end Customizable Deep Neural Network Compression and Deployment Toolkit for Prototype Hardware Accelerator Design
Quantization and Compression 2
Jian Meng ⋅ Yuan Liao ⋅ Anupreetham Anupreetham ⋅ Ahmed Hasssan ⋅ Shixing Yu ⋅ Han-sok Suh ⋅ Xiaofeng Hu ⋅ Jae-sun Seo
|
Poster Position Number 23 | |
|
ACCURATE LOW-DEGREE POLYNOMIAL APPROXIMATION OF NON-POLYNOMIAL OPERATORS FOR FAST PRIVATE INFERENCE IN HOMOMORPHIC ENCRYPTION
Privacy and security
Jingtian Dang ⋅ Jianming Tong ⋅ Anupam Golder ⋅ Cong "Callie" Hao ⋅ Arijit Raychowdhury ⋅ Tushar Krishna
|
Poster Position Number 24 | |
|
Q-Hitter: A Better Token Oracle for Efficient LLM Inference via Sparse-Quantized KV Cache
Large Language Models 1
Zhenyu Zhang ⋅ Shiwei Liu ⋅ Runjin Chen ⋅ Bhavya Kailkhura ⋅ Beidi Chen ⋅ Atlas Wang
|
Poster Position Number 26 | |
|
Proteus: Preserving Model Confidentiality during Graph Optimizations
Privacy and security
Yubo Gao ⋅ Maryam Haghifam ⋅ Christina Giannoula ⋅ Renbo Tu ⋅ Gennady Pekhimenko ⋅ Nandita Vijaykumar
|
Poster Position Number 27 | |
|
On Latency Predictors for Neural Architecture Search
ML for Systems
Yash Akhauri ⋅ Mohamed Abdelfattah
|
Poster Position Number 29 | |
|
QMoE: Sub-1-Bit Compression of Trillion Parameter Models
Quantization and Compression 1
Elias Frantar ⋅ Dan Alistarh
|
Poster Position Number 31 | |
|
DiffusionPipe: Training Large Diffusion Models with Efficient Pipelines
Parallel and Distributed 1
Ye Tian ⋅ Zhen Jia ⋅ Ziyue Luo ⋅ Yida Wang ⋅ Chuan Wu
|
Poster Position Number 32 | |
|
Punica: Multi-Tenant LoRA Serving
Large Language Models 1
Lequn Chen ⋅ Zihao Ye ⋅ Yongji Wu ⋅ Danyang Zhuo ⋅ Luis Ceze ⋅ Arvind Krishnamurthy
|
Poster Position Number 34 | |
|
Disaggregated Multi-Tower: Topology-aware Modeling Technique for Efficient Large Scale Recommendation
Parallel and Distributed 2
Liang Luo ⋅ Buyun Zhang ⋅ Michael Tsang ⋅ Yinbin Ma ⋅ Ching-Hsiang Chu ⋅ Yuxin Chen ⋅ Shen Li ⋅ Yuchen Hao ⋅ Yanli Zhao ⋅ Guna Lakshminarayanan ⋅ Ellie Wen ⋅ Jongsoo Park ⋅ Dheevatsa Mudigere ⋅ Maxim Naumov
|
Poster Position Number 36 | |
|
HeteGen: Efficient Heterogeneous Parallel Inference for Large Language Models on Resource-Constrained Devices
Parallel and Distributed 2
ZHAO XUANLEI ⋅ Bin Jia ⋅ Haotian Zhou ⋅ Ziming Liu ⋅ Shenggan Cheng ⋅ Yang You
|
Poster Position Number 37 |
Successful Page Load