MLSys 2024 List of Accepted Papers
| 
                    
                        SiDA: Sparsity-Inspired Data-Aware Serving for Efficient and Scalable  Large Mixture-of-Experts Models
                    
                    
                    Performance and Memory 
                        
                            
                                Zhixu Du · Shiyu Li · Yuhao Wu · Xiangyu Jiang · Jingwei Sun · Qilin Zheng · Yongkai Wu · Ang Li · Hai Li · Yiran Chen
                            
                        
                     
                 | 
                Poster Position Number 30 | |
| 
                    
                        Does Compressing Activations Help Model Parallel Training?
                    
                    
                    Measurement and Analysis 
                        
                            
                                Song Bian · Dacheng Li · Hongyi Wang · Eric Xing · Shivaram Venkataraman
                            
                        
                     
                 | 
                Poster Position Number 35 | |
| 
                    
                        FlashDecoding++: Faster Large Language Model Inference with Asynchronization, Flat GEMM Optimization, and Heuristics
                    
                    
                    LLM 2 
                        
                            
                                Ke Hong · Guohao Dai · Jiaming Xu · Qiuli Mao · Xiuhong Li · Jun Liu · kangdi chen · Yuhan Dong · Yu Wang
                            
                        
                     
                 | 
                Poster Position Number 33 | |
| 
                    
                        VIDUR: A LARGE-SCALE SIMULATION FRAMEWORK FOR LLM INFERENCE
                    
                    
                    Measurement and Analysis 
                        
                            
                                Amey Agrawal · Nitin Kedia · Jayashree Mohan · Ashish Panwar · Nipun Kwatra · Bhargav Gulavani · Ramachandran Ramjee · Alexey Tumanov
                            
                        
                     
                 | 
                Poster Position Number 1 | |
| 
                    
                        CloudEval-YAML: A Practical Benchmark for Cloud Configuration Generation
                    
                    
                    Measurement and Analysis 
                        
                            
                                Yifei Xu · Yuning Chen · Xumiao Zhang · Xianshang Lin · Pan Hu · Yunfei Ma · Songwu Lu · Wan Du · Zhuoqing Mao · Ennan Zhai · Dennis Cai
                            
                        
                     
                 | 
                Poster Position Number 2 | |
| 
                    
                        LIFL: A Lightweight, Event-driven Serverless Platform for Federated Learning
                    
                    
                    Federated Learning 
                        
                            
                                Shixiong Qi · K. K. Ramakrishnan · Myungjin Lee
                            
                        
                     
                 | 
                Poster Position Number 3 | |
| 
                    
                        FLASH: Fast Model Adaptation in ML-Centric Cloud Platforms
                    
                    
                    ML for Systems 
                        
                            
                                Haoran Qiu · Weichao Mao · Archit Patke · Shengkun Cui · Chen Wang · Hubertus Franke · Zbigniew Kalbarczyk · Tamer Basar · Ravi Iyer
                            
                        
                     
                 | 
                Poster Position Number 4 | |
| 
                    
                        UniDM: A Unified Framework for Data Manipulation with Large Language Models
                    
                    
                    ML for Systems 
                        
                            
                                Yichen Qian · Yongyi He · Rong Zhu · Jintao Huang · Zhijian Ma · Haibin Wang · Yaohua Wang · Xiuyu Sun · Defu Lian · Bolin Ding · Jingren Zhou
                            
                        
                     
                 | 
                Poster Position Number 5 | |
| 
                    
                        ACROBAT: Optimizing Auto-batching of Dynamic Deep Learning at Compile Time
                    
                    
                    Performance and Memory 
                        
                            
                                Pratik Fegade · Tianqi Chen · Phillip Gibbons · Todd Mowry
                            
                        
                     
                 | 
                Poster Position Number 6 | |
| 
                    
                        FedTrans: Efficient Federated Learning via Multi-Model Transformation
                    
                    
                    Federated Learning 
                        
                            
                                Yuxuan Zhu · Jiachen Liu · Mosharaf Chowdhury · Fan Lai
                            
                        
                     
                 | 
                Poster Position Number 7 | |
| 
                    
                        JIT-Q: Just-in-time Quantization with Processing-In-Memory for Efficient ML Training
                    
                    
                    Quantization and Compression 2 
                        
                            
                                Mohamed Ibrahim · Shaizeen Aga · Ada Li · Suchita Pati · Mahzabeen Islam
                            
                        
                     
                 | 
                Poster Position Number 8 | |
| 
                    
                        SLoRA: Scalable Serving of Thousands of LoRA Adapters
                    
                    
                    Large Language Models 1 
                        
                            
                                Ying Sheng · Shiyi Cao · Dacheng Li · Coleman Hooper · Nicholas Lee · Shuo Yang · Christopher Chou · Banghua Zhu · Lianmin Zheng · Kurt Keutzer · Joseph Gonzalez · Ion Stoica
                            
                        
                     
                 | 
                Poster Position Number 9 | |
| 
                    
                        HeteroSwitch: Characterizing and Taming System-Induced Data Heterogeneity in Federated Learning
                    
                    
                    Federated Learning 
                        
                            
                                Gyudong Kim · Mehdi Ghasemi · Soroush Heidari · Seungryong Kim · Young Geun Kim · Sarma Vrudhula · Carole-Jean Wu
                            
                        
                     
                 | 
                Poster Position Number 10 | |
| 
                    
                        Fine-Tuning Language Models Using Formal Methods Feedback: A Use Case in Autonomous Systems
                    
                    
                    Large Language Models 1 
                        
                            
                                Yunhao Yang · Neel P. Bhatt · Tyler Ingebrand · William Ward · Steven Carr · Atlas Wang · Ufuk Topcu
                            
                        
                     
                 | 
                Poster Position Number 11 | |
| 
                    
                        Distributed Matrix-Based Sampling for Graph Neural Network Training
                    
                    
                    Parallel and Distributed 1 
                        
                            
                                Alok Tripathy · Katherine Yelick · Aydin Buluc
                            
                        
                     
                 | 
                Poster Position Number 12 | |
| 
                    
                        Atom: Low-Bit Quantization for Efficient and Accurate LLM Serving
                    
                    
                    Quantization and Compression 1 
                        
                            
                                Yilong Zhao · Chien-Yu Lin · Kan Zhu · Zihao Ye · Lequn Chen · Size Zheng · Luis Ceze · Arvind Krishnamurthy · Tianqi Chen · Baris Kasikci
                            
                        
                     
                 | 
                Poster Position Number 13 | |
| 
                    
                        vMCU: Coordinated Memory Management and Kernel Optimization for DNN Inference on MCUs
                    
                    
                    Performance and Memory 
                        
                            
                                Size Zheng · Renze Chen · Meng Li · Zihao Ye · Luis Ceze · Yun Liang
                            
                        
                     
                 | 
                Poster Position Number 14 | |
| 
                    
                        AWQ: Activation-aware Weight Quantization for On-Device LLM Compression and Acceleration
                    
                    
                         
                        
                            
                                Ji Lin · Jiaming Tang · Haotian Tang · Shang Yang · Wei-Ming Chen · Wei-Chen Wang · Guangxuan Xiao · Xingyu Dang · Chuang Gan · Song Han
                            
                        
                     
                 | 
                Poster Position Number 15 | |
| 
                    
                        VQPy: An Object-Oriented Approach to Modern Video Analytics
                    
                    
                    ML for Systems 
                        
                            
                                Shan Yu · Zhenting Zhu · Yu Chen · Hanchen Xu · Pengzhan Zhao · Yang Wang · Arthi Padmanabhan · Hugo Latapie · Harry Xu
                            
                        
                     
                 | 
                Poster Position Number 16 | |
| 
                    
                        COMET: Neural Cost Model Explanation Framework
                    
                    
                    Measurement and Analysis 
                        
                            
                                Isha Chaudhary · Alex Renda · Charith Mendis · Gagandeep Singh
                            
                        
                     
                 | 
                Poster Position Number 17 | |
| 
                    
                        Schrodinger's FP Training Neural Networks with Dynamic Floating-Point Containers
                    
                    
                    Quantization and Compression 2 
                        
                            
                                Milos Nikolic · Enrique Torres Sanchez · Jiahui Wang · Ali Hadi Zadeh · Mostafa Mahmoud · Ameer Abdelhadi · Kareem Ibrahim · Andreas Moshovos
                            
                        
                     
                 | 
                Poster Position Number 18 | |
| 
                    
                        Lancet: Accelerating Mixture-of-Experts Training by Overlapping Weight Gradient Computation and All-to-All Communication
                    
                    
                    Parallel and Distributed 2 
                        
                            
                                Chenyu Jiang · Ye Tian · Zhen Jia · Chuan Wu · Yida Wang · Shuai Zheng
                            
                        
                     
                 | 
                Poster Position Number 19 | |
| 
                    
                        Accelerating ReLU for MPC-Based Private Inference with a Communication-Efficient Sign Estimation
                    
                    
                    Privacy and security 
                        
                            
                                Kiwan Maeng · G. Edward Suh
                            
                        
                     
                 | 
                Poster Position Number 20 | |
| 
                    
                        L-GreCo: Layerwise-adaptive Gradient Compression For Efficient Data-parallel Deep Learning
                    
                    
                    Parallel and Distributed 1 
                        
                            
                                Ilia Markov · Kaveh Alim · Elias Frantar · Dan Alistarh
                            
                        
                     
                 | 
                Poster Position Number 21 | |
| 
                    
                        Keyformer: KV Cache reduction through key tokens selection for Efficient Generative Inference
                    
                    
                    LLM 2 
                        
                            
                                Muhammad Adnan · Akhil Arunkumar · Gaurav Jain · Prashant Nair · Ilya Soloveychik · Purushotham Kamath
                            
                        
                     
                 | 
                Poster Position Number 22 | |
| 
                    
                        Torch2Chip: An End-to-end Customizable Deep Neural Network Compression and Deployment Toolkit for Prototype Hardware Accelerator Design
                    
                    
                    Quantization and Compression 2 
                        
                            
                                Jian Meng · Yuan Liao · Anupreetham Anupreetham · Ahmed Hasssan · Shixing Yu · Han-sok Suh · Xiaofeng Hu · Jae-sun Seo
                            
                        
                     
                 | 
                Poster Position Number 23 | |
| 
                    
                        ACCURATE LOW-DEGREE POLYNOMIAL APPROXIMATION OF NON-POLYNOMIAL OPERATORS FOR FAST PRIVATE INFERENCE IN HOMOMORPHIC ENCRYPTION
                    
                    
                    Privacy and security 
                        
                            
                                Jingtian Dang · Jianming Tong · Anupam Golder · Cong "Callie" Hao · Arijit Raychowdhury · Tushar Krishna
                            
                        
                     
                 | 
                Poster Position Number 24 | |
| 
                    
                        Prompt Cache: Modular Attention Reuse for Low-Latency Inference
                    
                    
                    LLM 2 
                        
                            
                                In Gim · Guojun Chen · Seung-seob Lee · Nikhil Sarda · Anurag Khandelwal · Lin Zhong
                            
                        
                     
                 | 
                Poster Position Number 25 | |
| 
                    
                        Q-Hitter: A Better Token Oracle for Efficient LLM Inference via Sparse-Quantized KV Cache
                    
                    
                    Large Language Models 1 
                        
                            
                                Zhenyu Zhang · Shiwei Liu · Runjin Chen · Bhavya Kailkhura · Beidi Chen · Atlas Wang
                            
                        
                     
                 | 
                Poster Position Number 26 | |
| 
                    
                        Proteus: Preserving Model Confidentiality during Graph Optimizations
                    
                    
                    Privacy and security 
                        
                            
                                Yubo Gao · Maryam Haghifam · Christina Giannoula · Renbo Tu · Gennady Pekhimenko · Nandita Vijaykumar
                            
                        
                     
                 | 
                Poster Position Number 27 | |
| 
                    
                        Efficient Post-training Quantization with FP8 Formats
                    
                    
                    Quantization and Compression 2 
                        
                            
                                Haihao Shen · Naveen Mellempudi · Xin He · Qun Gao · Chang Wang · Mengni Wang
                            
                        
                     
                 | 
                Poster Position Number 28 | |
| 
                    
                        On Latency Predictors for Neural Architecture Search
                    
                    
                    ML for Systems 
                        
                            
                                Yash Akhauri · Mohamed Abdelfattah
                            
                        
                     
                 | 
                Poster Position Number 29 | |
| 
                    
                        QMoE: Sub-1-Bit Compression of Trillion Parameter Models
                    
                    
                    Quantization and Compression 1 
                        
                            
                                Elias Frantar · Dan Alistarh
                            
                        
                     
                 | 
                Poster Position Number 31 | |
| 
                    
                        DiffusionPipe: Training Large Diffusion Models with Efficient Pipelines
                    
                    
                    Parallel and Distributed 1 
                        
                            
                                Ye Tian · Zhen Jia · Ziyue Luo · Yida Wang · Chuan Wu
                            
                        
                     
                 | 
                Poster Position Number 32 | |
| 
                    
                        Punica: Multi-Tenant LoRA Serving
                    
                    
                    Large Language Models 1 
                        
                            
                                Lequn Chen · Zihao Ye · Yongji Wu · Danyang Zhuo · Luis Ceze · Arvind Krishnamurthy
                            
                        
                     
                 | 
                Poster Position Number 34 | |
| 
                    
                        Disaggregated Multi-Tower: Topology-aware Modeling Technique for Efficient Large Scale Recommendation
                    
                    
                    Parallel and Distributed 2 
                        
                            
                                Liang Luo · Buyun Zhang · Michael Tsang · Yinbin Ma · Ching-Hsiang Chu · Yuxin Chen · Shen Li · Yuchen Hao · Yanli Zhao · Guna Lakshminarayanan · Ellie Wen · Jongsoo Park · Dheevatsa Mudigere · Maxim Naumov
                            
                        
                     
                 | 
                Poster Position Number 36 | |
| 
                    
                        HeteGen: Efficient Heterogeneous Parallel Inference for Large Language Models on Resource-Constrained Devices
                    
                    
                    Parallel and Distributed 2 
                        
                            
                                ZHAO XUANLEI · Bin Jia · Haotian Zhou · Ziming Liu · Shenggan Cheng · Yang You
                            
                        
                     
                 | 
                Poster Position Number 37 |