List of Accepted Paper

MLSys 2025 List of Accepted Papers



A Bring-Your-Own-Model Approach for ML-Driven Storage Placement in Warehouse-Scale Computers Session 6: Edge and Cloud Systems Chenxi Yang · Yan Li · Martin Maas · Mustafa Uysal · Ubaid Hafeez · Arif Merchant · Richard McDougall		Mission City Ballroom #18
QServe:W4A8KV4 Quantization and System Co-design for Efficient LLM Serving Session 3: Quantization and Sparsity Yujun Lin · Haotian Tang · Shang Yang · Zhekai Zhang · Guangxuan Xiao · Chuang Gan · Song Han		Mission City Ballroom #1
DiffServe: Efficiently Serving Text-to-Image Diffusion Models with Query-Aware Model Scaling Session 1: LLM and Diffusion Model Serving Sohaib Ahmad · Qizheng Yang · Haoliang Wang · Ramesh Sitaraman · Hui Guan		Mission City Ballroom #2
FlexAttention: A Programming Model for Generating Fused Attention Variants. Session 10: LLM and Diffusion Model Serving Juechu Dong · BOYUAN FENG · Driss Guessous · Yanbo Liang · Horace He		Mission City Ballroom #3
ThunderServe: High-performance and Cost-efficient LLM Serving in Cloud Environments Session 10: LLM and Diffusion Model Serving YOUHE JIANG · Fangcheng Fu · Xiaozhe Yao · Taiyi Wang · Bin CUI · Ana Klimovic · Eiko Yoneki		Mission City Ballroom #5
PipeFill: Using GPUs During Bubbles in Pipeline-parallel LLM Training Session 2: Parallel and Distributed Systems Daiyaan Arfeen · Zhen Zhang · Xinwei Fu · Gregory R. Ganger · Yida Wang		Mission City Ballroom #6
SparseTransX: Efficient Training of Translation-Based Knowledge Graph Embeddings Using Sparse Matrix Operations Session 7: Quantization and Sparsity Md Saidul Hoque Anik · Ariful Azad		Mission City Ballroom #7
FastTree: Optimizing Attention Kernel and Runtime for Tree-Structured LLM Inference Session 1: LLM and Diffusion Model Serving Zaifeng Pan · Yitong Ding · Yue Guan · Zheng Wang · Zhongkai Yu · Xulong Tang · Yida Wang · Yufei Ding		Mission City Ballroom #11
ProtoRAIL: A Risk-cognizant Imitation Agent for Adaptive vCPU Oversubscription In the Cloud Session 6: Edge and Cloud Systems Lu Wang · Mayukh Das · Fangkai Yang · Bo Qiao · Hang Dong · Si Qin · Victor Ruehle · Chetan Bansal · Eli Cortez · Íñigo Goiri · S R · Qingwei Lin · Dongmei Zhang		Mission City Ballroom #12
Supply-Chain Attacks in Machine Learning Frameworks Session 12: Edge and Cloud Systems Yue Gao · Ilia Shumailov · Kassem Fawaz		Mission City Ballroom #13
Graph Learning at Scale: Characterizing and Optimizing Pre-Propagation GNNs Session 12: Edge and Cloud Systems Zichao Yue · Chenhui Deng · Zhiru Zhang		Mission City Ballroom #15
LeanAttention: Hardware-Aware Scalable Attention Mechanism for the Decode-Phase of Transformers Session 1: LLM and Diffusion Model Serving Rya Sanovar · Srikant Bharadwaj · Renée St. Amant · Victor Ruehle · Saravan Rajmohan		Mission City Ballroom #20
Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer Session 5: LLM training and fine-tuning Jinghan Yao · Sam Jacobs · Masahiro Tanaka · Olatunji Ruwase · Hari Subramoni · Dhabaleswar Panda		Mission City Ballroom #21
Lightweight Software Kernels and Hardware Extensions for Efficient Sparse Deep Neural Networks on Microcontrollers Session 7: Quantization and Sparsity Francesco Daghero · Daniele Jahier Pagliari · Francesco Conti · Luca Benini · Massimo Poncino · Alessio Burrello		Mission City Ballroom #22
MiLo: Efficient Quantized MoE Inference with Mixture of Low-Rank Compensators Session 3: Quantization and Sparsity Beichen Huang · Yueming Yuan · ZELEI SHAO · Minjia Zhang		Mission City Ballroom #23
Efficient On-Device Machine Learning with a Biologically-Plausible Forward-Only Algorithm Session 6: Edge and Cloud Systems Baichuan Huang · Amir Aminifar		Mission City Ballroom #25
FLStore: Efficient Federated Learning Storage for non-training workloads Session 11: Federated Learning Ahmad Faraz Khan · Samuel Fountain · Ahmed Mohamed Abdelmoniem Sayed · Ali R. Butt · Ali Anwar		Mission City Ballroom #26
Enabling Unstructured Sparse Acceleration on Structured Sparse Accelerators Session 3: Quantization and Sparsity Geonhwa Jeong · Po-An Tsai · Abhimanyu Rajeshkumar Bambhaniya · Stephen Keckler · Tushar Krishna		Mission City Ballroom #27
FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving Session 1: LLM and Diffusion Model Serving Zihao Ye · Lequn Chen · Ruihang Lai · Wuwei Lin · Yineng Zhang · Stephanie Wang · Tianqi Chen · Baris Kasikci · Vinod Grover · Arvind Krishnamurthy · Luis Ceze		Mission City Ballroom #30
SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention Session 7: Quantization and Sparsity Qianchao Zhu · Jiangfei Duan · Chang Chen · Siran Liu · Xiuhong Li · Guanyu Feng · Xin Lv · Xiao Chuanfu · Dahua Lin · Chao Yang		Mission City Ballroom #31
Scaling Deep Learning Training with MPMD Pipeline Parallelism Session 9: Parallel and Distributed Systems Anxhelo Xhebraj · Sean Lee · Hanfeng Chen · Vinod Grover		Mission City Ballroom #32
Radius: Range-based Gradient Sparsity for Large Foundation Model Pre-training Session 3: Quantization and Sparsity Mingkai Zheng · Zhao Zhang		Mission City Ballroom #33
Context Parallelism for Scalable Million-Token Inference Session 2: Parallel and Distributed Systems Amy Yang · Jingyi Yang · Aya Ibrahim · Xinfeng Xie · Bangsheng Tang · Grigory Sizov · Jongsoo Park · Jianyu Huang		Mission City Ballroom #34
Seesaw: High-throughput LLM Inference via Model Re-sharding Session 8: LLM and Diffusion Model Serving Qidong Su · Wei Zhao · Xin Li · Muralidhar Andoorveedu · Chenhao Jiang · Zhanda Zhu · Kevin Song · Christina Giannoula · Gennady Pekhimenko		Mission City Ballroom #36
ScaleFusion: Scalable Inference of Spatial-Temporal Diffusion Transformers for High-Resolution Long Video Generation Session 8: LLM and Diffusion Model Serving Jiacheng Yang · Jun Wu · Zhen Zhang · Xinwei Fu · Zhiying Xu · Zhen Jia · Yida Wang · Gennady Pekhimenko		Mission City Ballroom #37
TurboAttention: Efficient attention approximation for high throughputs llm Session 8: LLM and Diffusion Model Serving Hao Kang · Srikant Bharadwaj · James Hensman · Tushar Krishna · Victor Ruehle · Saravan Rajmohan		Mission City Ballroom #39
GSplit: Scaling Graph Neural Network Training on Large Graphs via Split-Parallelism Session 2: Parallel and Distributed Systems Sandeep Polisetty · Juelin Liu · Yi Fung · Seung-Hwan Lim · Hui Guan · Marco Serafini		Mission City Ballroom #40
TileLink: Generating Efficient Compute-Communication Overlapping Kernels using Tile-Centric Primitives Session 9: Parallel and Distributed Systems Size Zheng · Jin Fang · Xuegui Zheng · Qi Hou · Wenlei Bao · Ningxin Zheng · Ziheng Jiang · Dongyang Wang · Jianxi Ye · Haibin Lin · Li-Wen Chang · Xin Liu		Mission City Ballroom #41
Self-Data Distillation for Recovering Quality in Pruned Large Language Models Session 3: Quantization and Sparsity Vithursan Thangarasa · Ganesh Venkatesh · Mike Lasby · Nish Sinnadurai · Sean Lie		Mission City Ballroom #42
MAS-ATTENTION: MEMORY-AWARE STREAM PROCESSING FOR ATTENTION ACCELERATION ON RESOURCE-CONSTRAINED EDGE DEVICES Session 11: Federated Learning Mohammadali Shakerdargah · Shan Lu · Chao Gao · Di Niu		Mission City Ballroom #44
Interference-aware Edge Runtime Prediction with Conformal Matrix Completion Session 4: Reliable and Scalable Systems Tianshu Huang · Arjun Ramesh · Emily Ruppel · Nuno Pereira · Anthony Rowe · Carlee Joe-Wong		Mission City Ballroom #47
APOLLO: SGD-like Memory, AdamW-level Performance Session 5: LLM training and fine-tuning Hanqing Zhu · Zhenyu Zhang · Wenyan Cong · Xi Liu · Sem Park · Vikas Chandra · Bo Long · David Pan · Atlas Wang · Jinwon Lee		Mission City Ballroom #48
Lumos: Efficient Performance Modeling and Estimation for Large-scale LLM Training Session 5: LLM training and fine-tuning Mingyu Liang · Hiwot Kassa · Wenyin Fu · Brian Coutinho · Louis Feng · Christina Delimitrou		Mission City Ballroom #49
The Hidden Bloat in Machine Learning Systems Session 4: Reliable and Scalable Systems Huaifeng Zhang · Ahmed Ali-Eldin Hassan		Mission City Ballroom #51
Rethinking Key-Value Cache Compression Techniques for Large Language Model Serving Session 1: LLM and Diffusion Model Serving Wei Gao · Xinyu Zhou · Peng Sun · Tianwei Zhang · Yonggang Wen		Mission City Ballroom #53
XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models Session 10: LLM and Diffusion Model Serving Yixin Dong · Charlie Ruan · Yaxing Cai · Ziyi Xu · Yilong Zhao · Ruihang Lai · Tianqi Chen		Mission City Ballroom #54
On Distributed Larger-Than-Memory Subset Selection With Pairwise Submodular Functions Session 9: Parallel and Distributed Systems Maximilian Böther · Abe Sebastian · Pranjal Awasthi · Ana Klimovic · Srikumar Ramalingam		Mission City Ballroom #56
Rubick: Exploiting Job Reconfigurability for Deep Learning Cluster Scheduling Session 2: Parallel and Distributed Systems Xinyi Zhang · Hanyu Zhao · Wencong Xiao · Xianyan Jia · Fei Xu · Yong Li · Wei Lin · Fangming Liu		Mission City Ballroom #57
SOLA: Optimizing SLO Attainment for Large Language Model Serving with State-Aware Scheduling Session 8: LLM and Diffusion Model Serving Ke Hong · Xiuhong Li · Lufang Chen · Qiuli Mao · Guohao Dai · Xuefei Ning · Shengen Yan · Yun Liang · Yu Wang		Mission City Ballroom #58
NEO: Saving GPU Memory Crisis with CPU Offloading for Online LLM Inference Session 10: LLM and Diffusion Model Serving Xuanlin Jiang · Yang Zhou · Shiyi Cao · Ion Stoica · Minlan Yu		Mission City Ballroom #59
AdaParse: An Adaptive Parallel PDF Parsing and Resource Scaling Engine Session 2: Parallel and Distributed Systems Carlo Siebenschuh · Kyle Hippe · Ozan Gokdemir · Alexander Brace · Arham Khan · Khalid Hossain · Yadu Babuji · Nicholas Chia · Venkatram Vishwanath · Arvind Ramanathan · Rick Stevens · Ian Foster · Robert Underwood		Mission City Ballroom #60
ReaL: Efficient RLHF Training of Large Language Models with Parameter Reallocation Session 5: LLM training and fine-tuning Zhiyu Mei · WEI FU · Kaiwei Li · Guangju Wang · Huanchen Zhang · Yi Wu		Mission City Ballroom #61
Know Where You’re Uncertain When Planning with Multimodal Foundation Models: A Formal Framework Session 4: Reliable and Scalable Systems Neel P. Bhatt · Yunhao Yang · Rohan Siva · Daniel Milan · Ufuk Topcu · Atlas Wang		Mission City Ballroom #16
VoLUT: Efficient Volumetric streaming enhanced by LUT-based super-resolution Session 12: Edge and Cloud Systems Chendong Wang · Anlan Zhang · Yifan Yang · Lili Qiu · Yuqing Yang · XINYANG JIANG · Feng Qian · Suman Banerjee		Mission City Ballroom #14
AIOpsLab: A Holistic Framework to Evaluate AI Agents for Enabling Autonomous Clouds Session 4: Reliable and Scalable Systems Yinfang Chen · Manish Shetty · Gagan Somashekar · Minghua Ma · Yogesh Simmhan · Jonathan Mace · Chetan Bansal · Rujia Wang · S R		Mission City Ballroom #4
LAVA: Lifetime-Aware VM Allocation with Learned Distributions and Adaptation to Mispredictions Session 12: Edge and Cloud Systems Jianheng Ling · Pratik Worah · Yawen Wang · Yunchuan Kong · Chunlei Wang · Clifford Stein · Diwakar Gupta · Jason Behmer · Logan Bush · Prakash Ramanan · Rajesh Kumar · Thomas Chestna · Yajing Liu · Ying Liu · Ye Zhao · Kathryn S. McKinley · Meeyoung Park · Martin Maas		Mission City Ballroom #8
Photon: Federated LLM Pre-Training Session 11: Federated Learning Lorenzo Sani · Alex Iacob · Zeyu Cao · Royson Lee · Bill Marino · Yan Gao · Wanru Zhao · Dongqi Cai · Zexi Li · Xinchi Qiu · Nic Lane		Mission City Ballroom #9
SwiftVI: Time-Efficient Planning and Learning with MDPs Session 6: Edge and Cloud Systems Kasper Overgaard Mortensen · Konstantinos Skitsas · Emil Morre Christensen · Mohammad Sadegh Talebi · Andreas Pavlogiannis · Davide Mottin · Panagiotis Karras		Mission City Ballroom #10
Youmu: Efficient Columnar Data Pipeline for LLM Training Session 5: LLM training and fine-tuning Tianle Zhong · Jiechen Zhao · Qiang Su · Geoffrey Fox		Mission City Ballroom #17
LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention Session 7: Quantization and Sparsity Shang Yang · Junxian Guo · Haotian Tang · Qinghao Hu · Guangxuan Xiao · Jiaming Tang · Yujun Lin · Zhijian Liu · Yao Lu · Song Han		Mission City Ballroom #19
FedProphet: Memory-Efficient Federated Adversarial Training via Robust and Consistent Cascade Learning Session 11: Federated Learning Minxue Tang · Yitu Wang · Jingyang Zhang · Louis DiValentin · Aolin Ding · Amin Hass · Yiran Chen · Hai Li		Mission City Ballroom #24
Optimizing LLM Queries in Relational Data Analytics Workloads Session 6: Edge and Cloud Systems Shu Liu · Asim Biswal · Audrey Cheng · Amog Kamsetty · Luis Gaspar Schroeder · Liana Patel · Shiyi Cao · Xiangxi Mo · Ion Stoica · Joseph Gonzalez · Matei Zaharia		Mission City Ballroom #28
Marconi: Prefix Caching for the Era of Hybrid LLMs Session 10: LLM and Diffusion Model Serving Rui Pan · Zhuang Wang · Zhen Jia · Can Karakus · Luca Zancato · Tri Dao · Yida Wang · Ravi Netravali		Mission City Ballroom #29
HyC-LoRA: Memory Efficient LoRA Fine-tuning with Hybrid Activation Compression Session 5: LLM training and fine-tuning Yujin Wang · Shunan Dong · Zongle Huang · Yichen You · Liu He · Huazhong Yang · Yongpan Liu · Hongyang Jia		Mission City Ballroom #35
Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking Session 7: Quantization and Sparsity Marco Federici · Davide Belli · Mart van Baalen · Amir Jalalirad · Andrii Skliar · Bence Major · Markus Nagel · Paul Whatmough		Mission City Ballroom #38
COMET: Fine-grained Computation-communication Overlapping for Mixture-of-Experts Session 9: Parallel and Distributed Systems Shulai Zhang · Ningxin Zheng · Haibin Lin · Ziheng Jiang · Wenlei Bao · Chengquan Jiang · Qi Hou · Weihao Cui · Size Zheng · Li-Wen Chang · Quan Chen · Xin Liu		Mission City Ballroom #43
MEADOW: Memory-efficient Dataflow and Data Packing for Low Power Edge LLMs Session 12: Edge and Cloud Systems Abhishek Moitra · Arkapravo Ghosh · Shrey Agrawal · Aporva Amarnath · Karthik Swaminathan · Priyadarshini Panda		Mission City Ballroom #45
AI Metropolis: Scaling Large Language Model-based Multi-Agent Simulation with Out-of-order Execution Session 4: Reliable and Scalable Systems Zhiqiang Xie · Hao Kang · Ying Sheng · Tushar Krishna · Kayvon Fatahalian · Christos Kozyrakis		Mission City Ballroom #46
Venn: Resource Management For Collaborative Learning Jobs Session 11: Federated Learning Jiachen Liu · Fan Lai · Eric Ding · Yiwen Zhang · Mosharaf Chowdhury		Mission City Ballroom #50
Balancing Pipeline Parallelism with Vocabulary Parallelism Session 9: Parallel and Distributed Systems Man Tsung Yeung · Penghui Qi · Min Lin · Xinyi Wan		Mission City Ballroom #52
FlexInfer: Flexible LLM Inference with CPU Computations Session 8: LLM and Diffusion Model Serving Seonjin Na · Geonhwa Jeong · Byung Hoon Ahn · Aaron Jezghani · Jeffrey Young · Christopher Hughes · Tushar Krishna · Hyesoon Kim		Mission City Ballroom #55