Skip to yearly menu bar Skip to main content


Recorded Events

Discover all conference events with available recordings

74 recorded events

We use SlidesLive as our recording platform to live stream conference events. All recordings become freely available 30 days after the conference ends and are hosted on our website, viewable at any time for your convenience.

Filter by Event Type

Closing Remarks (1 event)

Closing Remarks

May 15, 2025 at 6:00 PM
Mission City Ballroom
0.1 hour
View Event & Recording
Industry (1 event)

Industry Lightning Talks

May 12, 2025 at 1:30 PM
Mission City Ballroom
1.0 hour
View Event & Recording
Invited Talk (4 events)

Extreme PyTorch: Inside the Most Demanding ML Workloads—and the Open Challenges in Building AI Agents to Democratize Them

May 12, 2025 at 9:30 AM
Mission City Ballroom
1.0 hour
Presenter:
Soumith Chintala
View Event & Recording

An AI stack: from scaling AI workloads to evaluating LLMs

May 13, 2025 at 10:30 AM
Mission City Ballroom
1.0 hour
Presenter:
Ion Stoica
View Event & Recording

Hardware-aware training and inference for large-scale AI

May 14, 2025 at 10:30 AM
Mission City Ballroom
1.0 hour
Presenter:
Animashree Anandkumar
View Event & Recording

Responsible Finetuning of Large Language Models

May 15, 2025 at 10:30 AM
Mission City Ballroom
1.0 hour
Presenter:
Ling Liu
View Event & Recording
Opening Remarks (2 events)

Opening Remarks - Young Professional Symposium

May 12, 2025 at 9:20 AM
Mission City Ballroom
0.2 hour
View Event & Recording

Opening Remarks

May 13, 2025 at 8:30 AM
Mission City Ballroom
0.3 hour
View Event & Recording
Panel Discussion (1 event)

Panel Discussion

May 12, 2025 at 2:30 PM
Mission City Ballroom
1.0 hour
Presenters:
Manasi Joshi Tim Dettmers Soumith Chintala
View Event & Recording
Poster (60 events)

A Bring-Your-Own-Model Approach for ML-Driven Storage Placement in Warehouse-Scale Computers

Mission City Ballroom
Presenters:
Chenxi Yang Yan Li Martin Maas Mustafa Uysal Ubaid Hafeez Arif Merchant Richard McDougall
View Event & Recording

AdaParse: An Adaptive Parallel PDF Parsing and Resource Scaling Engine

Mission City Ballroom
Presenters:
Carlo Siebenschuh Kyle Hippe Ozan Gokdemir Alexander Brace Arham Khan Khalid Hossain Yadu Babuji Nicholas Chia Venkatram Vishwanath Arvind Ramanathan Rick Stevens Ian Foster Robert Underwood
View Event & Recording

AI Metropolis: Scaling Large Language Model-based Multi-Agent Simulation with Out-of-order Execution

Mission City Ballroom
Presenters:
Zhiqiang Xie Hao Kang Ying Sheng Tushar Krishna Kayvon Fatahalian Christos Kozyrakis
View Event & Recording

AIOpsLab: A Holistic Framework to Evaluate AI Agents for Enabling Autonomous Clouds

Mission City Ballroom
Presenters:
Yinfang Chen Manish Shetty Gagan Somashekar Minghua Ma Yogesh Simmhan Jonathan Mace Chetan Bansal Rujia Wang S R
View Event & Recording

APOLLO: SGD-like Memory, AdamW-level Performance

Mission City Ballroom
Presenters:
Hanqing Zhu Zhenyu Zhang Wenyan Cong Xi Liu Sem Park Vikas Chandra Bo Long David Pan Atlas Wang Jinwon Lee
View Event & Recording

Balancing Pipeline Parallelism with Vocabulary Parallelism

Mission City Ballroom
Presenters:
Man Tsung Yeung Penghui Qi Min Lin Xinyi Wan
View Event & Recording

COMET: Fine-grained Computation-communication Overlapping for Mixture-of-Experts

Mission City Ballroom
Presenters:
Shulai Zhang Ningxin Zheng Haibin Lin Ziheng Jiang Wenlei Bao Chengquan Jiang Qi Hou Weihao Cui Size Zheng Li-Wen Chang Quan Chen Xin Liu
View Event & Recording

Context Parallelism for Scalable Million-Token Inference

Mission City Ballroom
Presenters:
Amy Yang Jingyi Yang Aya Ibrahim Xinfeng Xie Bangsheng Tang Grigory Sizov Jongsoo Park Jianyu Huang
View Event & Recording

DiffServe: Efficiently Serving Text-to-Image Diffusion Models with Query-Aware Model Scaling

Mission City Ballroom
Presenters:
Sohaib Ahmad Qizheng Yang Haoliang Wang Ramesh Sitaraman Hui Guan
View Event & Recording

Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking

Mission City Ballroom
Presenters:
Marco Federici Davide Belli Mart van Baalen Amir Jalalirad Andrii Skliar Bence Major Markus Nagel Paul Whatmough
View Event & Recording

Efficient On-Device Machine Learning with a Biologically-Plausible Forward-Only Algorithm

Mission City Ballroom
Presenters:
Baichuan Huang Amir Aminifar
View Event & Recording

Enabling Unstructured Sparse Acceleration on Structured Sparse Accelerators

Mission City Ballroom
Presenters:
Geonhwa Jeong Po-An Tsai Abhimanyu Rajeshkumar Bambhaniya Stephen Keckler Tushar Krishna
View Event & Recording

FastTree: Optimizing Attention Kernel and Runtime for Tree-Structured LLM Inference

Mission City Ballroom
Presenters:
Zaifeng Pan Yitong Ding Yue Guan Zheng Wang Zhongkai Yu Xulong Tang Yida Wang Yufei Ding
View Event & Recording

FedProphet: Memory-Efficient Federated Adversarial Training via Robust and Consistent Cascade Learning

Mission City Ballroom
Presenters:
Minxue Tang Yitu Wang Jingyang Zhang Louis DiValentin Aolin Ding Amin Hass Yiran Chen Hai Li
View Event & Recording

FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving

Mission City Ballroom
Presenters:
Zihao Ye Lequn Chen Ruihang Lai Wuwei Lin Yineng Zhang Stephanie Wang Tianqi Chen Baris Kasikci Vinod Grover Arvind Krishnamurthy Luis Ceze
View Event & Recording

FlexAttention: A Programming Model for Generating Fused Attention Variants.

Mission City Ballroom
Presenters:
Juechu Dong BOYUAN FENG Driss Guessous Yanbo Liang Horace He
View Event & Recording

FlexInfer: Flexible LLM Inference with CPU Computations

Mission City Ballroom
Presenters:
Seonjin Na Geonhwa Jeong Byung Hoon Ahn Aaron Jezghani Jeffrey Young Christopher Hughes Tushar Krishna Hyesoon Kim
View Event & Recording

FLStore: Efficient Federated Learning Storage for non-training workloads

Mission City Ballroom
Presenters:
Ahmad Faraz Khan Samuel Fountain Ahmed Mohamed Abdelmoniem Sayed Ali R. Butt Ali Anwar
View Event & Recording

Graph Learning at Scale: Characterizing and Optimizing Pre-Propagation GNNs

Mission City Ballroom
Presenters:
Zichao Yue Chenhui Deng Zhiru Zhang
View Event & Recording

GSplit: Scaling Graph Neural Network Training on Large Graphs via Split-Parallelism

Mission City Ballroom
Presenters:
Sandeep Polisetty Juelin Liu Yi Fung Seung-Hwan Lim Hui Guan Marco Serafini
View Event & Recording

HyC-LoRA: Memory Efficient LoRA Fine-tuning with Hybrid Activation Compression

Mission City Ballroom
Presenters:
Yujin Wang Shunan Dong Zongle Huang Yichen You Liu He Huazhong Yang Yongpan Liu Hongyang Jia
View Event & Recording

Interference-aware Edge Runtime Prediction with Conformal Matrix Completion

Mission City Ballroom
Presenters:
Tianshu Huang Arjun Ramesh Emily Ruppel Nuno Pereira Anthony Rowe Carlee Joe-Wong
View Event & Recording

LAVA: Lifetime-Aware VM Allocation with Learned Distributions and Adaptation to Mispredictions

Mission City Ballroom
Presenters:
Jianheng Ling Pratik Worah Yawen Wang Yunchuan Kong Chunlei Wang Clifford Stein Diwakar Gupta Jason Behmer Logan Bush Prakash Ramanan Rajesh Kumar Thomas Chestna Yajing Liu Ying Liu Ye Zhao Kathryn S. McKinley Meeyoung Park Martin Maas
View Event & Recording

LeanAttention: Hardware-Aware Scalable Attention Mechanism for the Decode-Phase of Transformers

Mission City Ballroom
Presenters:
Rya Sanovar Srikant Bharadwaj Renée St. Amant Victor Ruehle Saravan Rajmohan
View Event & Recording

Lightweight Software Kernels and Hardware Extensions for Efficient Sparse Deep Neural Networks on Microcontrollers

Mission City Ballroom
Presenters:
Francesco Daghero Daniele Jahier Pagliari Francesco Conti Luca Benini Massimo Poncino Alessio Burrello
View Event & Recording

LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention

Mission City Ballroom
Presenters:
Shang Yang Junxian Guo Haotian Tang Qinghao Hu Guangxuan Xiao Jiaming Tang Yujun Lin Zhijian Liu Yao Lu Song Han
View Event & Recording

Lumos: Efficient Performance Modeling and Estimation for Large-scale LLM Training

Mission City Ballroom
Presenters:
Mingyu Liang Hiwot Kassa Wenyin Fu Brian Coutinho Louis Feng Christina Delimitrou
View Event & Recording

Marconi: Prefix Caching for the Era of Hybrid LLMs

Mission City Ballroom
Presenters:
Rui Pan Zhuang Wang Zhen Jia Can Karakus Luca Zancato Tri Dao Yida Wang Ravi Netravali
View Event & Recording

MAS-ATTENTION: MEMORY-AWARE STREAM PROCESSING FOR ATTENTION ACCELERATION ON RESOURCE-CONSTRAINED EDGE DEVICES

Mission City Ballroom
Presenters:
Mohammadali Shakerdargah Shan Lu Chao Gao Di Niu
View Event & Recording

MEADOW: Memory-efficient Dataflow and Data Packing for Low Power Edge LLMs

Mission City Ballroom
Presenters:
Abhishek Moitra Arkapravo Ghosh Shrey Agrawal Aporva Amarnath Karthik Swaminathan Priyadarshini Panda
View Event & Recording

MiLo: Efficient Quantized MoE Inference with Mixture of Low-Rank Compensators

Mission City Ballroom
Presenters:
Beichen Huang Yueming Yuan ZELEI SHAO Minjia Zhang
View Event & Recording

NEO: Saving GPU Memory Crisis with CPU Offloading for Online LLM Inference

Mission City Ballroom
Presenters:
Xuanlin Jiang Yang Zhou Shiyi Cao Ion Stoica Minlan Yu
View Event & Recording

On Distributed Larger-Than-Memory Subset Selection With Pairwise Submodular Functions

Mission City Ballroom
Presenters:
Maximilian Böther Abe Sebastian Pranjal Awasthi Ana Klimovic Srikumar Ramalingam
View Event & Recording

Optimizing LLM Queries in Relational Data Analytics Workloads

Mission City Ballroom
Presenters:
Shu Liu Asim Biswal Audrey Cheng Amog Kamsetty Luis Gaspar Schroeder Liana Patel Shiyi Cao Xiangxi Mo Ion Stoica Joseph Gonzalez Matei Zaharia
View Event & Recording

Photon: Federated LLM Pre-Training

Mission City Ballroom
Presenters:
Lorenzo Sani Alex Iacob Zeyu Cao Royson Lee Bill Marino Yan Gao Wanru Zhao Dongqi Cai Zexi Li Xinchi Qiu Nic Lane
View Event & Recording

PipeFill: Using GPUs During Bubbles in Pipeline-parallel LLM Training

Mission City Ballroom
Presenters:
Daiyaan Arfeen Zhen Zhang Xinwei Fu Gregory R. Ganger Yida Wang
View Event & Recording

ProtoRAIL: A Risk-cognizant Imitation Agent for Adaptive vCPU Oversubscription In the Cloud

Mission City Ballroom
Presenters:
Lu Wang Mayukh Das Fangkai Yang Bo Qiao Hang Dong Si Qin Victor Ruehle Chetan Bansal Eli Cortez Íñigo Goiri S R Qingwei Lin Dongmei Zhang
View Event & Recording

QServe:W4A8KV4 Quantization and System Co-design for Efficient LLM Serving

Mission City Ballroom
Presenters:
Yujun Lin Haotian Tang Shang Yang Zhekai Zhang Guangxuan Xiao Chuang Gan Song Han
View Event & Recording

Radius: Range-based Gradient Sparsity for Large Foundation Model Pre-training

Mission City Ballroom
Presenters:
Mingkai Zheng Zhao Zhang
View Event & Recording

ReaL: Efficient RLHF Training of Large Language Models with Parameter Reallocation

Mission City Ballroom
Presenters:
Zhiyu Mei WEI FU Kaiwei Li Guangju Wang Huanchen Zhang Yi Wu
View Event & Recording

Rethinking Key-Value Cache Compression Techniques for Large Language Model Serving

Mission City Ballroom
Presenters:
Wei Gao Xinyu Zhou Peng Sun Tianwei Zhang Yonggang Wen
View Event & Recording

Rubick: Exploiting Job Reconfigurability for Deep Learning Cluster Scheduling

Mission City Ballroom
Presenters:
Xinyi Zhang Hanyu Zhao Wencong Xiao Xianyan Jia Fei Xu Yong Li Wei Lin Fangming Liu
View Event & Recording

SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention

Mission City Ballroom
Presenters:
Qianchao Zhu Jiangfei Duan Chang Chen Siran Liu Xiuhong Li Guanyu Feng Xin Lv Xiao Chuanfu Dahua Lin Chao Yang
View Event & Recording

ScaleFusion: Scalable Inference of Spatial-Temporal Diffusion Transformers for High-Resolution Long Video Generation

Mission City Ballroom
Presenters:
Jiacheng Yang Jun Wu Zhen Zhang Xinwei Fu Zhiying Xu Zhen Jia Yida Wang Gennady Pekhimenko
View Event & Recording

Scaling Deep Learning Training with MPMD Pipeline Parallelism

Mission City Ballroom
Presenters:
Anxhelo Xhebraj Sean Lee Hanfeng Chen Vinod Grover
View Event & Recording

Seesaw: High-throughput LLM Inference via Model Re-sharding

Mission City Ballroom
Presenters:
Qidong Su Wei Zhao Xin Li Muralidhar Andoorveedu Chenhao Jiang Zhanda Zhu Kevin Song Christina Giannoula Gennady Pekhimenko
View Event & Recording

Self-Data Distillation for Recovering Quality in Pruned Large Language Models

Mission City Ballroom
Presenters:
Vithursan Thangarasa Ganesh Venkatesh Mike Lasby Nish Sinnadurai Sean Lie
View Event & Recording

SOLA: Optimizing SLO Attainment for Large Language Model Serving with State-Aware Scheduling

Mission City Ballroom
Presenters:
Ke Hong Xiuhong Li Lufang Chen Qiuli Mao Guohao Dai Xuefei Ning Shengen Yan Yun Liang Yu Wang
View Event & Recording

SparseTransX: Efficient Training of Translation-Based Knowledge Graph Embeddings Using Sparse Matrix Operations

Mission City Ballroom
Presenters:
Md Saidul Hoque Anik Ariful Azad
View Event & Recording

Supply-Chain Attacks in Machine Learning Frameworks

Mission City Ballroom
Presenters:
Yue Gao Ilia Shumailov Kassem Fawaz
View Event & Recording

SwiftVI: Time-Efficient Planning and Learning with MDPs

Mission City Ballroom
Presenters:
Kasper Overgaard Mortensen Konstantinos Skitsas Emil Morre Christensen Mohammad Sadegh Talebi Andreas Pavlogiannis Davide Mottin Panagiotis Karras
View Event & Recording

The Hidden Bloat in Machine Learning Systems

Mission City Ballroom
Presenters:
Huaifeng Zhang Ahmed Ali-Eldin Hassan
View Event & Recording

ThunderServe: High-performance and Cost-efficient LLM Serving in Cloud Environments

Mission City Ballroom
Presenters:
YOUHE JIANG Fangcheng Fu Xiaozhe Yao Taiyi Wang Bin CUI Ana Klimovic Eiko Yoneki
View Event & Recording

TileLink: Generating Efficient Compute-Communication Overlapping Kernels using Tile-Centric Primitives

Mission City Ballroom
Presenters:
Size Zheng Jin Fang Xuegui Zheng Qi Hou Wenlei Bao Ningxin Zheng Ziheng Jiang Dongyang Wang Jianxi Ye Haibin Lin Li-Wen Chang Xin Liu
View Event & Recording

Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer

Mission City Ballroom
Presenters:
Jinghan Yao Sam Jacobs Masahiro Tanaka Olatunji Ruwase Hari Subramoni Dhabaleswar Panda
View Event & Recording

TurboAttention: Efficient attention approximation for high throughputs llm

Mission City Ballroom
Presenters:
Hao Kang Srikant Bharadwaj James Hensman Tushar Krishna Victor Ruehle Saravan Rajmohan
View Event & Recording

Venn: Resource Management For Collaborative Learning Jobs

Mission City Ballroom
Presenters:
Jiachen Liu Fan Lai Eric Ding Yiwen Zhang Mosharaf Chowdhury
View Event & Recording

VoLUT: Efficient Volumetric streaming enhanced by LUT-based super-resolution

Mission City Ballroom
Presenters:
Chendong Wang Anlan Zhang Yifan Yang Lili Qiu Yuqing Yang XINYANG JIANG Feng Qian Suman Banerjee
View Event & Recording

XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models

Mission City Ballroom
Presenters:
Yixin Dong Charlie Ruan Yaxing Cai Ziyi Xu Yilong Zhao Ruihang Lai Tianqi Chen
View Event & Recording

Youmu: Efficient Columnar Data Pipeline for LLM Training

Mission City Ballroom
Presenters:
Tianle Zhong Jiechen Zhao Qiang Su Geoffrey Fox
View Event & Recording
Poster Session (1 event)

Poster Session and Reception - Young Professional Symposium

May 12, 2025 at 4:00 PM
Mission City Ballroom
2.0 hour
View Event & Recording
Talk (4 events)

Lessons Learned from Successful PhD Students

May 12, 2025 at 10:45 AM
Mission City Ballroom
0.3 hour
Presenter:
Tim Dettmers
View Event & Recording

LMArena: An Open Platform for Crowdsourced AI benchmarks

May 12, 2025 at 11:05 AM
Mission City Ballroom
0.3 hour
Presenter:
Wei-Lin Chiang
View Event & Recording

Designing Models from the Hardware Up

May 12, 2025 at 11:25 AM
Mission City Ballroom
0.3 hour
Presenter:
Simran Arora
View Event & Recording

YPS - Talk by Beidi Chen

May 12, 2025 at 11:45 AM
Mission City Ballroom
0.3 hour
Presenter:
Beidi Chen
View Event & Recording