MLSys 2025 List of Accepted Papers
Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking
Session 7: Quantization and Sparsity
Marco Federici · Davide Belli · Mart van Baalen · Amir Jalalirad · Andrii Skliar · Bence Major · Markus Nagel · Paul Whatmough
|
Mission City Ballroom #38 | |
Youmu: Efficient Columnar Data Pipeline for LLM Training
Session 5: LLM training and fine-tuning
Tianle Zhong · Jiechen Zhao · Qiang Su · Geoffrey Fox
|
Mission City Ballroom #17 | |
Efficient On-Device Machine Learning with a Biologically-Plausible Forward-Only Algorithm
Session 6: Edge and Cloud Systems
Baichuan Huang · Amir Aminifar
|
Mission City Ballroom #25 | |
FLStore: Efficient Federated Learning Storage for non-training workloads
Session 11: Federated Learning
Ahmad Faraz Khan · Samuel Fountain · Ahmed Mohamed Abdelmoniem Sayed · Ali R. Butt · Ali Anwar
|
Mission City Ballroom #26 | |
AdaParse: An Adaptive Parallel PDF Parsing and Resource Scaling Engine
Session 2: Parallel and Distributed Systems
Carlo Siebenschuh · Kyle Hippe · Ozan Gokdemir · Alexander Brace · Arham Khan · Khalid Hossain · Yadu Babuji · Nicholas Chia · Venkatram Vishwanath · Arvind Ramanathan · Rick Stevens · Ian Foster · Robert Underwood
|
Mission City Ballroom #60 | |
MAS-ATTENTION: MEMORY-AWARE STREAM PROCESSING FOR ATTENTION ACCELERATION ON RESOURCE-CONSTRAINED EDGE DEVICES
Session 11: Federated Learning
Mohammadali Shakerdargah · Shan Lu · Chao Gao · Di Niu
|
Mission City Ballroom #44 | |
MEADOW: Memory-efficient Dataflow and Data Packing for Low Power Edge LLMs
Session 12: Edge and Cloud Systems
Abhishek Moitra · Arkapravo Ghosh · Shrey Agrawal · Aporva Amarnath · Karthik Swaminathan · Priyadarshini Panda
|
Mission City Ballroom #45 | |
SwiftVI: Time-Efficient Planning and Learning with MDPs
Session 6: Edge and Cloud Systems
Kasper Overgaard Mortensen · Konstantinos Skitsas · Emil Morre Christensen · Mohammad Sadegh Talebi · Andreas Pavlogiannis · Davide Mottin · Panagiotis Karras
|
Mission City Ballroom #10 | |
DiffServe: Efficiently Serving Text-to-Image Diffusion Models with Query-Aware Model Scaling
Session 1: LLM and Diffusion Model Serving
Sohaib Ahmad · Qizheng Yang · Haoliang Wang · Ramesh Sitaraman · Hui Guan
|
Mission City Ballroom #2 | |
Graph Learning at Scale: Characterizing and Optimizing Pre-Propagation GNNs
Session 12: Edge and Cloud Systems
Zichao Yue · Chenhui Deng · Zhiru Zhang
|
Mission City Ballroom #15 | |
SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention
Session 7: Quantization and Sparsity
Qianchao Zhu · Jiangfei Duan · Chang Chen · Siran Liu · Xiuhong Li · Guanyu Feng · Xin Lv · Xiao Chuanfu · Dahua Lin · Chao Yang
|
Mission City Ballroom #31 | |
Lightweight Software Kernels and Hardware Extensions for Efficient Sparse Deep Neural Networks on Microcontrollers
Session 7: Quantization and Sparsity
Francesco Daghero · Daniele Jahier Pagliari · Francesco Conti · Luca Benini · Massimo Poncino · Alessio Burrello
|
Mission City Ballroom #22 | |
SparseTransX: Efficient Training of Translation-Based Knowledge Graph Embeddings Using Sparse Matrix Operations
Session 7: Quantization and Sparsity
Md Saidul Hoque Anik · Ariful Azad
|
Mission City Ballroom #7 | |
ReaL: Efficient RLHF Training of Large Language Models with Parameter Reallocation
Session 5: LLM training and fine-tuning
Zhiyu Mei · WEI FU · Kaiwei Li · Guangju Wang · Huanchen Zhang · Yi Wu
|
Mission City Ballroom #61 | |
On Distributed Larger-Than-Memory Subset Selection With Pairwise Submodular Functions
Session 9: Parallel and Distributed Systems
Maximilian Böther · Abe Sebastian · Pranjal Awasthi · Ana Klimovic · Srikumar Ramalingam
|
Mission City Ballroom #56 | |
Interference-aware Edge Runtime Prediction with Conformal Matrix Completion
Session 4: Reliable and Scalable Systems
Tianshu Huang · Arjun Ramesh · Emily Ruppel · Nuno Pereira · Anthony Rowe · Carlee Joe-Wong
|
Mission City Ballroom #47 | |
Enabling Unstructured Sparse Acceleration on Structured Sparse Accelerators
Session 3: Quantization and Sparsity
Geonhwa Jeong · Po-An Tsai · Abhimanyu Rajeshkumar Bambhaniya · Stephen Keckler · Tushar Krishna
|
Mission City Ballroom #27 | |
FlexInfer: Flexible LLM Inference with CPU Computations
Session 8: LLM and Diffusion Model Serving
Seonjin Na · Geonhwa Jeong · Byung Hoon Ahn · Aaron Jezghani · Jeffrey Young · Christopher Hughes · Tushar Krishna · Hyesoon Kim
|
Mission City Ballroom #55 | |
Balancing Pipeline Parallelism with Vocabulary Parallelism
Session 9: Parallel and Distributed Systems
Man Tsung Yeung · Penghui Qi · Min Lin · Xinyi Wan
|
Mission City Ballroom #52 | |
A Bring-Your-Own-Model Approach for ML-Driven Storage Placement in Warehouse-Scale Computers
Session 6: Edge and Cloud Systems
Chenxi Yang · Yan Li · Martin Maas · Mustafa Uysal · Ubaid Hafeez · Arif Merchant · Richard McDougall
|
Mission City Ballroom #18 | |
Scaling Deep Learning Training with MPMD Pipeline Parallelism
Session 9: Parallel and Distributed Systems
Anxhelo Xhebraj · Sean Lee · Hanfeng Chen · Vinod Grover
|
Mission City Ballroom #32 | |
LeanAttention: Hardware-Aware Scalable Attention Mechanism for the Decode-Phase of Transformers
Session 1: LLM and Diffusion Model Serving
Rya Sanovar · Srikant Bharadwaj · Renée St. Amant · Victor Ruehle · Saravan Rajmohan
|
Mission City Ballroom #20 | |
TurboAttention: Efficient attention approximation for high throughputs llm
Session 8: LLM and Diffusion Model Serving
Hao Kang · Srikant Bharadwaj · James Hensman · Tushar Krishna · Victor Ruehle · Saravan Rajmohan
|
Mission City Ballroom #39 | |
Self-Data Distillation for Recovering Quality in Pruned Large Language Models
Session 3: Quantization and Sparsity
Vithursan Thangarasa · Ganesh Venkatesh · Mike Lasby · Nish Sinnadurai · Sean Lie
|
Mission City Ballroom #42 | |
The Hidden Bloat in Machine Learning Systems
Huaifeng Zhang · Ahmed Ali-Eldin Hassan
|
Mission City Ballroom #51 | |
Lumos: Efficient Performance Modeling and Estimation for Large-scale LLM Training
Session 5: LLM training and fine-tuning
Mingyu Liang · Hiwot Kassa · Wenyin Fu · Brian Coutinho · Louis Feng · Christina Delimitrou
|
Mission City Ballroom #49 | |
Supply-Chain Attacks in Machine Learning Frameworks
Session 12: Edge and Cloud Systems
Yue Gao · Ilia Shumailov · Kassem Fawaz
|
Mission City Ballroom #13 | |
ProtoRAIL: A Risk-cognizant Imitation Agent for Adaptive vCPU Oversubscription In the Cloud
Session 6: Edge and Cloud Systems
Lu Wang · Mayukh Das · Fangkai Yang · Bo Qiao · Hang Dong · Si Qin · Victor Ruehle · Chetan Bansal · Eli Cortez · Íñigo Goiri · S R · Qingwei Lin · Dongmei Zhang
|
Mission City Ballroom #12 | |
AI Metropolis: Scaling Large Language Model-based Multi-Agent Simulation with Out-of-order Execution
Session 4: Reliable and Scalable Systems
Zhiqiang Xie · Hao Kang · Ying Sheng · Tushar Krishna · Kayvon Fatahalian · Christos Kozyrakis
|
Mission City Ballroom #46 | |
Context Parallelism for Scalable Million-Token Inference
Session 2: Parallel and Distributed Systems
Amy Yang · Jingyi Yang · Aya Ibrahim · Xinfeng Xie · Bangsheng Tang · Grigory Sizov · Jongsoo Park · Jianyu Huang
|
Mission City Ballroom #34 | |
Optimizing LLM Queries in Relational Data Analytics Workloads
Session 6: Edge and Cloud Systems
Shu Liu · Asim Biswal · Audrey Cheng · Amog Kamsetty · Luis Gaspar Schroeder · Liana Patel · Shiyi Cao · Xiangxi Mo · Ion Stoica · Joseph Gonzalez · Matei Zaharia
|
Mission City Ballroom #28 | |
NEO: Saving GPU Memory Crisis with CPU Offloading for Online LLM Inference
Session 10: LLM and Diffusion Model Serving
Xuanlin Jiang · Yang Zhou · Shiyi Cao · Ion Stoica · Minlan Yu
|
Mission City Ballroom #59 | |
Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer
Session 5: LLM training and fine-tuning
Jinghan Yao · Sam Jacobs · Masahiro Tanaka · Olatunji Ruwase · Hari Subramoni · Dhabaleswar Panda
|
Mission City Ballroom #21 | |
FlexAttention: A Programming Model for Generating Fused Attention Variants.
Session 10: LLM and Diffusion Model Serving
Juechu Dong · BOYUAN FENG · Driss Guessous · Yanbo Liang · Horace He
|
Mission City Ballroom #3 | |
AIOpsLab: A Holistic Framework to Evaluate AI Agents for Enabling Autonomous Clouds
Session 4: Reliable and Scalable Systems
Yinfang Chen · Manish Shetty · Gagan Somashekar · Minghua Ma · Yogesh Simmhan · Jonathan Mace · Chetan Bansal · Rujia Wang · S R
|
Mission City Ballroom #4 | |
FedProphet: Memory-Efficient Federated Adversarial Training via Robust and Consistent Cascade Learning
Session 11: Federated Learning
Minxue Tang · Yitu Wang · Jingyang Zhang · Louis DiValentin · Aolin Ding · Amin Hass · Yiran Chen · Hai Li
|
Mission City Ballroom #24 | |
Venn: Resource Management For Collaborative Learning Jobs
Session 11: Federated Learning
Jiachen Liu · Fan Lai · Eric Ding · Yiwen Zhang · Mosharaf Chowdhury
|
Mission City Ballroom #50 | |
XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models
Session 10: LLM and Diffusion Model Serving
Yixin Dong · Charlie Ruan · Yaxing Cai · Ziyi Xu · Yilong Zhao · Ruihang Lai · Tianqi Chen
|
Mission City Ballroom #54 | |
ThunderServe: High-performance and Cost-efficient LLM Serving in Cloud Environments
Session 10: LLM and Diffusion Model Serving
YOUHE JIANG · Fangcheng Fu · Xiaozhe Yao · Taiyi Wang · Bin CUI · Ana Klimovic · Eiko Yoneki
|
Mission City Ballroom #5 | |
GSplit: Scaling Graph Neural Network Training on Large Graphs via Split-Parallelism
Session 2: Parallel and Distributed Systems
Sandeep Polisetty · Juelin Liu · Yi Fung · Seung-Hwan Lim · Hui Guan · Marco Serafini
|
Mission City Ballroom #40 | |
HyC-LoRA: Memory Efficient LoRA Fine-tuning with Hybrid Activation Compression
Session 5: LLM training and fine-tuning
Yujin Wang · Shunan Dong · Zongle Huang · Yichen You · Liu He · Huazhong Yang · Yongpan Liu · Hongyang Jia
|
Mission City Ballroom #35 | |
LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention
Session 7: Quantization and Sparsity
Shang Yang · Junxian Guo · Haotian Tang · Qinghao Hu · Guangxuan Xiao · Jiaming Tang · Yujun Lin · Zhijian Liu · Yao Lu · Song Han
|
Mission City Ballroom #19 | |
LAVA: Lifetime-Aware VM Allocation with Learned Distributions and Adaptation to Mispredictions
Jianheng Ling · Pratik Worah · Yawen Wang · Yunchuan Kong · Chunlei Wang · Clifford Stein · Diwakar Gupta · Jason Behmer · Logan Bush · Prakash Ramanan · Rajesh Kumar · Thomas Chestna · Yajing Liu · Ying Liu · Ye Zhao · Kathryn S. McKinley · Meeyoung Park · Martin Maas
|
Mission City Ballroom #8 | |
Know Where You’re Uncertain When Planning with Multimodal Foundation Models: A Formal Framework
Session 4: Reliable and Scalable Systems
Neel P. Bhatt · Yunhao Yang · Rohan Siva · Daniel Milan · Ufuk Topcu · Atlas Wang
|
Mission City Ballroom #16 | |
VoLUT: Efficient Volumetric streaming enhanced by LUT-based super-resolution
Session 12: Edge and Cloud Systems
Chendong Wang · Anlan Zhang · Yifan Yang · Lili Qiu · Yuqing Yang · XINYANG JIANG · Feng Qian · Suman Banerjee
|
Mission City Ballroom #14 | |
SOLA: Optimizing SLO Attainment for Large Language Model Serving with State-Aware Scheduling
Session 8: LLM and Diffusion Model Serving
Ke Hong · Xiuhong Li · Lufang Chen · Qiuli Mao · Guohao Dai · Xuefei Ning · Shengen Yan · Yun Liang · Yu Wang
|
Mission City Ballroom #58 | |
MiLo: Efficient Quantized MoE Inference with Mixture of Low-Rank Compensators
Session 3: Quantization and Sparsity
Beichen Huang · Yueming Yuan · ZELEI SHAO · Minjia Zhang
|
Mission City Ballroom #23 | |
Photon: Federated LLM Pre-Training
Session 11: Federated Learning
Lorenzo Sani · Alex Iacob · Zeyu Cao · Royson Lee · Bill Marino · Yan Gao · Wanru Zhao · Dongqi Cai · Zexi Li · Xinchi Qiu · Nic Lane
|
Mission City Ballroom #9 | |
Seesaw: High-throughput LLM Inference via Model Re-sharding
Qidong Su · Wei Zhao · Xin Li · Muralidhar Andoorveedu · Chenhao Jiang · Zhanda Zhu · Kevin Song · Christina Giannoula · Gennady Pekhimenko
|
Mission City Ballroom #36 | |
PipeFill: Using GPUs During Bubbles in Pipeline-parallel LLM Training
Session 2: Parallel and Distributed Systems
Daiyaan Arfeen · Zhen Zhang · Xinwei Fu · Gregory R. Ganger · Yida Wang
|
Mission City Ballroom #6 | |
Radius: Range-based Gradient Sparsity for Large Foundation Model Pre-training
Session 3: Quantization and Sparsity
Mingkai Zheng · Zhao Zhang
|
Mission City Ballroom #33 | |
QServe:W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
Session 3: Quantization and Sparsity
Yujun Lin · Haotian Tang · Shang Yang · Zhekai Zhang · Guangxuan Xiao · Chuang Gan · Song Han
|
Mission City Ballroom #1 | |
Rubick: Exploiting Job Reconfigurability for Deep Learning Cluster Scheduling
Session 2: Parallel and Distributed Systems
Xinyi Zhang · Hanyu Zhao · Wencong Xiao · Xianyan Jia · Fei Xu · Yong Li · Wei Lin · Fangming Liu
|
Mission City Ballroom #57 | |
ScaleFusion: Scalable Inference of Spatial-Temporal Diffusion Transformers for High-Resolution Long Video Generation
Session 8: LLM and Diffusion Model Serving
Jiacheng Yang · Jun Wu · Zhen Zhang · Xinwei Fu · Zhiying Xu · Zhen Jia · Yida Wang · Gennady Pekhimenko
|
Mission City Ballroom #37 | |
APOLLO: SGD-like Memory, AdamW-level Performance
Hanqing Zhu · Zhenyu Zhang · Wenyan Cong · Xi Liu · Sem Park · Vikas Chandra · Bo Long · David Pan · Atlas Wang · Jinwon Lee
|
Mission City Ballroom #48 | |
Rethinking Key-Value Cache Compression Techniques for Large Language Model Serving
Session 1: LLM and Diffusion Model Serving
Wei Gao · Xinyu Zhou · Peng Sun · Tianwei Zhang · Yonggang Wen
|
Mission City Ballroom #53 | |
Marconi: Prefix Caching for the Era of Hybrid LLMs
Rui Pan · Zhuang Wang · Zhen Jia · Can Karakus · Luca Zancato · Tri Dao · Yida Wang · Ravi Netravali
|
Mission City Ballroom #29 | |
FastTree: Optimizing Attention Kernel and Runtime for Tree-Structured LLM Inference
Session 1: LLM and Diffusion Model Serving
Zaifeng Pan · Yitong Ding · Yue Guan · Zheng Wang · Zhongkai Yu · Xulong Tang · Yida Wang · Yufei Ding
|
Mission City Ballroom #11 | |
FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving
Zihao Ye · Lequn Chen · Ruihang Lai · Wuwei Lin · Yineng Zhang · Stephanie Wang · Tianqi Chen · Baris Kasikci · Vinod Grover · Arvind Krishnamurthy · Luis Ceze
|
Mission City Ballroom #30 | |
TileLink: Generating Efficient Compute-Communication Overlapping Kernels using Tile-Centric Primitives
Session 9: Parallel and Distributed Systems
Size Zheng · Jin Fang · Xuegui Zheng · Qi Hou · Wenlei Bao · Ningxin Zheng · Ziheng Jiang · Dongyang Wang · Jianxi Ye · Haibin Lin · Li-Wen Chang · Xin Liu
|
Mission City Ballroom #41 | |
COMET: Fine-grained Computation-communication Overlapping for Mixture-of-Experts
Shulai Zhang · Ningxin Zheng · Haibin Lin · Ziheng Jiang · Wenlei Bao · Chengquan Jiang · Qi Hou · Weihao Cui · Size Zheng · Li-Wen Chang · Quan Chen · Xin Liu
|
Mission City Ballroom #43 |