Recorded Events

Discover all conference events with available recordings

74 recorded events

We use SlidesLive as our recording platform to live stream conference events. All recordings become freely available 30 days after the conference ends and are hosted on our website, viewable at any time for your convenience.

Closing Remarks (1 event)

Closing Remarks

May 15, 2025 at 6:00 PM

Mission City Ballroom

0.1 hour

View Event & Recording

Industry (1 event)

Industry Lightning Talks

May 12, 2025 at 1:30 PM

Mission City Ballroom

1.0 hour

View Event & Recording

Invited Talk (4 events)

Extreme PyTorch: Inside the Most Demanding ML Workloads—and the Open Challenges in Building AI Agents to Democratize Them

May 12, 2025 at 9:30 AM

Mission City Ballroom

1.0 hour

Presenter:

Soumith Chintala

View Event & Recording

An AI stack: from scaling AI workloads to evaluating LLMs

May 13, 2025 at 10:30 AM

Mission City Ballroom

1.0 hour

Presenter:

Ion Stoica

View Event & Recording

Hardware-aware training and inference for large-scale AI

May 14, 2025 at 10:30 AM

Mission City Ballroom

1.0 hour

Presenter:

Animashree Anandkumar

View Event & Recording

Responsible Finetuning of Large Language Models

May 15, 2025 at 10:30 AM

Mission City Ballroom

1.0 hour

Presenter:

Ling Liu

View Event & Recording

Opening Remarks (2 events)

Opening Remarks - Young Professional Symposium

May 12, 2025 at 9:20 AM

Mission City Ballroom

0.2 hour

View Event & Recording

Opening Remarks

May 13, 2025 at 8:30 AM

Mission City Ballroom

0.3 hour

View Event & Recording

Panel Discussion (1 event)

Panel Discussion

May 12, 2025 at 2:30 PM

Mission City Ballroom

1.0 hour

Presenters:

Manasi Joshi Tim Dettmers Soumith Chintala

View Event & Recording

Poster (60 events)

A Bring-Your-Own-Model Approach for ML-Driven Storage Placement in Warehouse-Scale Computers

Mission City Ballroom

Presenters:

Chenxi Yang Yan Li Martin Maas Mustafa Uysal Ubaid Hafeez Arif Merchant Richard McDougall

View Event & Recording

AdaParse: An Adaptive Parallel PDF Parsing and Resource Scaling Engine

Mission City Ballroom

Presenters:

Carlo Siebenschuh Kyle Hippe Ozan Gokdemir Alexander Brace Arham Khan Khalid Hossain Yadu Babuji Nicholas Chia Venkatram Vishwanath Arvind Ramanathan Rick Stevens Ian Foster Robert Underwood

View Event & Recording

AIOpsLab: A Holistic Framework to Evaluate AI Agents for Enabling Autonomous Clouds

Mission City Ballroom

Presenters:

Yinfang Chen Manish Shetty Gagan Somashekar Minghua Ma Yogesh Simmhan Jonathan Mace Chetan Bansal Rujia Wang S R

View Event & Recording

APOLLO: SGD-like Memory, AdamW-level Performance

Mission City Ballroom

Presenters:

Hanqing Zhu Zhenyu Zhang Wenyan Cong Xi Liu Sem Park Vikas Chandra Bo Long David Pan Atlas Wang Jinwon Lee

View Event & Recording

Balancing Pipeline Parallelism with Vocabulary Parallelism

Mission City Ballroom

Presenters:

Man Tsung Yeung Penghui Qi Min Lin Xinyi Wan

View Event & Recording

COMET: Fine-grained Computation-communication Overlapping for Mixture-of-Experts

Mission City Ballroom

Presenters:

Shulai Zhang Ningxin Zheng Haibin Lin Ziheng Jiang Wenlei Bao Chengquan Jiang Qi Hou Weihao Cui Size Zheng Li-Wen Chang Quan Chen Xin Liu

View Event & Recording

Context Parallelism for Scalable Million-Token Inference

Mission City Ballroom

Presenters:

Amy Yang Jingyi Yang Aya Ibrahim Xinfeng Xie Bangsheng Tang Grigory Sizov Jongsoo Park Jianyu Huang

View Event & Recording

DiffServe: Efficiently Serving Text-to-Image Diffusion Models with Query-Aware Model Scaling

Mission City Ballroom

Presenters:

Sohaib Ahmad Qizheng Yang Haoliang Wang Ramesh Sitaraman Hui Guan

View Event & Recording

Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking

Mission City Ballroom

Presenters:

Marco Federici Davide Belli Mart van Baalen Amir Jalalirad Andrii Skliar Bence Major Markus Nagel Paul Whatmough

View Event & Recording

Efficient On-Device Machine Learning with a Biologically-Plausible Forward-Only Algorithm

Mission City Ballroom

Presenters:

Baichuan Huang Amir Aminifar

View Event & Recording

Enabling Unstructured Sparse Acceleration on Structured Sparse Accelerators

Mission City Ballroom

Presenters:

Geonhwa Jeong Po-An Tsai Abhimanyu Rajeshkumar Bambhaniya Stephen Keckler Tushar Krishna

View Event & Recording

FastTree: Optimizing Attention Kernel and Runtime for Tree-Structured LLM Inference

Mission City Ballroom

Presenters:

Zaifeng Pan Yitong Ding Yue Guan Zheng Wang Zhongkai Yu Xulong Tang Yida Wang Yufei Ding

View Event & Recording

FedProphet: Memory-Efficient Federated Adversarial Training via Robust and Consistent Cascade Learning

Mission City Ballroom

Presenters:

Minxue Tang Yitu Wang Jingyang Zhang Louis DiValentin Aolin Ding Amin Hass Yiran Chen Hai Li

View Event & Recording

FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving

Mission City Ballroom

Presenters:

Zihao Ye Lequn Chen Ruihang Lai Wuwei Lin Yineng Zhang Stephanie Wang Tianqi Chen Baris Kasikci Vinod Grover Arvind Krishnamurthy Luis Ceze

View Event & Recording

FlexAttention: A Programming Model for Generating Fused Attention Variants.

Mission City Ballroom

Presenters:

Juechu Dong BOYUAN FENG Driss Guessous Yanbo Liang Horace He

View Event & Recording

FlexInfer: Flexible LLM Inference with CPU Computations

Mission City Ballroom

Presenters:

Seonjin Na Geonhwa Jeong Byung Hoon Ahn Aaron Jezghani Jeffrey Young Christopher Hughes Tushar Krishna Hyesoon Kim

View Event & Recording

FLStore: Efficient Federated Learning Storage for non-training workloads

Mission City Ballroom

Presenters:

Ahmad Faraz Khan Samuel Fountain Ahmed Mohamed Abdelmoniem Sayed Ali R. Butt Ali Anwar

View Event & Recording

Graph Learning at Scale: Characterizing and Optimizing Pre-Propagation GNNs

Mission City Ballroom

Presenters:

Zichao Yue Chenhui Deng Zhiru Zhang

View Event & Recording

GSplit: Scaling Graph Neural Network Training on Large Graphs via Split-Parallelism

Mission City Ballroom

Presenters:

Sandeep Polisetty Juelin Liu Yi Fung Seung-Hwan Lim Hui Guan Marco Serafini

View Event & Recording

HyC-LoRA: Memory Efficient LoRA Fine-tuning with Hybrid Activation Compression

Mission City Ballroom

Presenters:

Yujin Wang Shunan Dong Zongle Huang Yichen You Liu He Huazhong Yang Yongpan Liu Hongyang Jia

View Event & Recording

Interference-aware Edge Runtime Prediction with Conformal Matrix Completion

Mission City Ballroom

Presenters:

Tianshu Huang Arjun Ramesh Emily Ruppel Nuno Pereira Anthony Rowe Carlee Joe-Wong

View Event & Recording

Know Where You’re Uncertain When Planning with Multimodal Foundation Models: A Formal Framework

Mission City Ballroom

Presenters:

Neel P. Bhatt Yunhao Yang Rohan Siva Daniel Milan Ufuk Topcu Atlas Wang

View Event & Recording

LAVA: Lifetime-Aware VM Allocation with Learned Distributions and Adaptation to Mispredictions

Mission City Ballroom

Presenters:

Jianheng Ling Pratik Worah Yawen Wang Yunchuan Kong Chunlei Wang Clifford Stein Diwakar Gupta Jason Behmer Logan Bush Prakash Ramanan Rajesh Kumar Thomas Chestna Yajing Liu Ying Liu Ye Zhao Kathryn S. McKinley Meeyoung Park Martin Maas

View Event & Recording

LeanAttention: Hardware-Aware Scalable Attention Mechanism for the Decode-Phase of Transformers

Mission City Ballroom

Presenters:

Rya Sanovar Srikant Bharadwaj Renée St. Amant Victor Ruehle Saravan Rajmohan

View Event & Recording

Lightweight Software Kernels and Hardware Extensions for Efficient Sparse Deep Neural Networks on Microcontrollers

Mission City Ballroom

Presenters:

Francesco Daghero Daniele Jahier Pagliari Francesco Conti Luca Benini Massimo Poncino Alessio Burrello

View Event & Recording

LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention

Mission City Ballroom

Presenters:

Shang Yang Junxian Guo Haotian Tang Qinghao Hu Guangxuan Xiao Jiaming Tang Yujun Lin Zhijian Liu Yao Lu Song Han

View Event & Recording

Lumos: Efficient Performance Modeling and Estimation for Large-scale LLM Training

Mission City Ballroom

Presenters:

Mingyu Liang Hiwot Kassa Wenyin Fu Brian Coutinho Louis Feng Christina Delimitrou

View Event & Recording

Marconi: Prefix Caching for the Era of Hybrid LLMs

Mission City Ballroom

Presenters:

Rui Pan Zhuang Wang Zhen Jia Can Karakus Luca Zancato Tri Dao Yida Wang Ravi Netravali

View Event & Recording

MAS-ATTENTION: MEMORY-AWARE STREAM PROCESSING FOR ATTENTION ACCELERATION ON RESOURCE-CONSTRAINED EDGE DEVICES

Mission City Ballroom

Presenters:

Mohammadali Shakerdargah Shan Lu Chao Gao Di Niu

View Event & Recording

MEADOW: Memory-efficient Dataflow and Data Packing for Low Power Edge LLMs

Mission City Ballroom

Presenters:

Abhishek Moitra Arkapravo Ghosh Shrey Agrawal Aporva Amarnath Karthik Swaminathan Priyadarshini Panda

View Event & Recording

MiLo: Efficient Quantized MoE Inference with Mixture of Low-Rank Compensators

Mission City Ballroom

Presenters:

Beichen Huang Yueming Yuan ZELEI SHAO Minjia Zhang

View Event & Recording

NEO: Saving GPU Memory Crisis with CPU Offloading for Online LLM Inference

Mission City Ballroom

Presenters:

Xuanlin Jiang Yang Zhou Shiyi Cao Ion Stoica Minlan Yu

View Event & Recording

On Distributed Larger-Than-Memory Subset Selection With Pairwise Submodular Functions

Mission City Ballroom

Presenters:

Maximilian Böther Abe Sebastian Pranjal Awasthi Ana Klimovic Srikumar Ramalingam

View Event & Recording

Optimizing LLM Queries in Relational Data Analytics Workloads

Mission City Ballroom

Presenters:

Shu Liu Asim Biswal Audrey Cheng Amog Kamsetty Luis Gaspar Schroeder Liana Patel Shiyi Cao Xiangxi Mo Ion Stoica Joseph Gonzalez Matei Zaharia

View Event & Recording

Photon: Federated LLM Pre-Training

Mission City Ballroom

Presenters:

Lorenzo Sani Alex Iacob Zeyu Cao Royson Lee Bill Marino Yan Gao Wanru Zhao Dongqi Cai Zexi Li Xinchi Qiu Nic Lane

View Event & Recording

PipeFill: Using GPUs During Bubbles in Pipeline-parallel LLM Training

Mission City Ballroom

Presenters:

Daiyaan Arfeen Zhen Zhang Xinwei Fu Gregory R. Ganger Yida Wang

View Event & Recording

ProtoRAIL: A Risk-cognizant Imitation Agent for Adaptive vCPU Oversubscription In the Cloud

Mission City Ballroom

Presenters:

Lu Wang Mayukh Das Fangkai Yang Bo Qiao Hang Dong Si Qin Victor Ruehle Chetan Bansal Eli Cortez Íñigo Goiri S R Qingwei Lin Dongmei Zhang

View Event & Recording

QServe:W4A8KV4 Quantization and System Co-design for Efficient LLM Serving

Mission City Ballroom

Presenters:

Yujun Lin Haotian Tang Shang Yang Zhekai Zhang Guangxuan Xiao Chuang Gan Song Han

View Event & Recording

Radius: Range-based Gradient Sparsity for Large Foundation Model Pre-training

Mission City Ballroom

Presenters:

Mingkai Zheng Zhao Zhang

View Event & Recording

ReaL: Efficient RLHF Training of Large Language Models with Parameter Reallocation

Mission City Ballroom

Presenters:

Zhiyu Mei WEI FU Kaiwei Li Guangju Wang Huanchen Zhang Yi Wu

View Event & Recording

Rethinking Key-Value Cache Compression Techniques for Large Language Model Serving

Mission City Ballroom

Presenters:

Wei Gao Xinyu Zhou Peng Sun Tianwei Zhang Yonggang Wen

View Event & Recording

Rubick: Exploiting Job Reconfigurability for Deep Learning Cluster Scheduling

Mission City Ballroom

Presenters:

Xinyi Zhang Hanyu Zhao Wencong Xiao Xianyan Jia Fei Xu Yong Li Wei Lin Fangming Liu

View Event & Recording

SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention

Mission City Ballroom

Presenters:

Qianchao Zhu Jiangfei Duan Chang Chen Siran Liu Xiuhong Li Guanyu Feng Xin Lv Xiao Chuanfu Dahua Lin Chao Yang

View Event & Recording

ScaleFusion: Scalable Inference of Spatial-Temporal Diffusion Transformers for High-Resolution Long Video Generation

Mission City Ballroom

Presenters:

Jiacheng Yang Jun Wu Zhen Zhang Xinwei Fu Zhiying Xu Zhen Jia Yida Wang Gennady Pekhimenko

View Event & Recording

Scaling Deep Learning Training with MPMD Pipeline Parallelism

Mission City Ballroom

Presenters:

Anxhelo Xhebraj Sean Lee Hanfeng Chen Vinod Grover

View Event & Recording

Seesaw: High-throughput LLM Inference via Model Re-sharding

Mission City Ballroom

Presenters:

Qidong Su Wei Zhao Xin Li Muralidhar Andoorveedu Chenhao Jiang Zhanda Zhu Kevin Song Christina Giannoula Gennady Pekhimenko

View Event & Recording

Self-Data Distillation for Recovering Quality in Pruned Large Language Models

Mission City Ballroom

Presenters:

Vithursan Thangarasa Ganesh Venkatesh Mike Lasby Nish Sinnadurai Sean Lie

View Event & Recording

SOLA: Optimizing SLO Attainment for Large Language Model Serving with State-Aware Scheduling

Mission City Ballroom

Presenters:

Ke Hong Xiuhong Li Lufang Chen Qiuli Mao Guohao Dai Xuefei Ning Shengen Yan Yun Liang Yu Wang

View Event & Recording

SparseTransX: Efficient Training of Translation-Based Knowledge Graph Embeddings Using Sparse Matrix Operations

Mission City Ballroom

Presenters:

Md Saidul Hoque Anik Ariful Azad

View Event & Recording

Supply-Chain Attacks in Machine Learning Frameworks

Mission City Ballroom

Presenters:

Yue Gao Ilia Shumailov Kassem Fawaz

View Event & Recording

SwiftVI: Time-Efficient Planning and Learning with MDPs

Mission City Ballroom

Presenters:

Kasper Overgaard Mortensen Konstantinos Skitsas Emil Morre Christensen Mohammad Sadegh Talebi Andreas Pavlogiannis Davide Mottin Panagiotis Karras

View Event & Recording

The Hidden Bloat in Machine Learning Systems

Mission City Ballroom

Presenters:

Huaifeng Zhang Ahmed Ali-Eldin Hassan

View Event & Recording

ThunderServe: High-performance and Cost-efficient LLM Serving in Cloud Environments

Mission City Ballroom

Presenters:

YOUHE JIANG Fangcheng Fu Xiaozhe Yao Taiyi Wang Bin CUI Ana Klimovic Eiko Yoneki

View Event & Recording

TileLink: Generating Efficient Compute-Communication Overlapping Kernels using Tile-Centric Primitives

Mission City Ballroom

Presenters:

Size Zheng Jin Fang Xuegui Zheng Qi Hou Wenlei Bao Ningxin Zheng Ziheng Jiang Dongyang Wang Jianxi Ye Haibin Lin Li-Wen Chang Xin Liu

View Event & Recording

Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer

Mission City Ballroom

Presenters:

Jinghan Yao Sam Jacobs Masahiro Tanaka Olatunji Ruwase Hari Subramoni Dhabaleswar Panda

View Event & Recording

TurboAttention: Efficient attention approximation for high throughputs llm

Mission City Ballroom

Presenters:

Hao Kang Srikant Bharadwaj James Hensman Tushar Krishna Victor Ruehle Saravan Rajmohan

View Event & Recording

Venn: Resource Management For Collaborative Learning Jobs

Mission City Ballroom

Presenters:

Jiachen Liu Fan Lai Eric Ding Yiwen Zhang Mosharaf Chowdhury

View Event & Recording

VoLUT: Efficient Volumetric streaming enhanced by LUT-based super-resolution

Mission City Ballroom

Presenters:

Chendong Wang Anlan Zhang Yifan Yang Lili Qiu Yuqing Yang XINYANG JIANG Feng Qian Suman Banerjee

View Event & Recording

XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models

Mission City Ballroom

Presenters:

Yixin Dong Charlie Ruan Yaxing Cai Ziyi Xu Yilong Zhao Ruihang Lai Tianqi Chen

View Event & Recording

Youmu: Efficient Columnar Data Pipeline for LLM Training

Mission City Ballroom

Presenters:

Tianle Zhong Jiechen Zhao Qiang Su Geoffrey Fox

View Event & Recording

Poster Session (1 event)

Poster Session and Reception - Young Professional Symposium

May 12, 2025 at 4:00 PM

Mission City Ballroom

2.0 hour

View Event & Recording

Talk (4 events)

Lessons Learned from Successful PhD Students

May 12, 2025 at 10:45 AM

Mission City Ballroom

0.3 hour

Presenter:

Tim Dettmers

View Event & Recording

LMArena: An Open Platform for Crowdsourced AI benchmarks

May 12, 2025 at 11:05 AM

Mission City Ballroom

0.3 hour

Presenter:

Wei-Lin Chiang

View Event & Recording

Designing Models from the Hardware Up

May 12, 2025 at 11:25 AM

Mission City Ballroom

0.3 hour

Presenter:

Simran Arora

View Event & Recording

YPS - Talk by Beidi Chen

May 12, 2025 at 11:45 AM

Mission City Ballroom

0.3 hour

Presenter:

Beidi Chen

View Event & Recording

Recorded Events

Filter by Event Type

Closing Remarks

Industry Lightning Talks

Extreme PyTorch: Inside the Most Demanding ML Workloads—and the Open Challenges in Building AI Agents to Democratize Them

An AI stack: from scaling AI workloads to evaluating LLMs

Hardware-aware training and inference for large-scale AI

Responsible Finetuning of Large Language Models

Opening Remarks - Young Professional Symposium

Opening Remarks

Panel Discussion

A Bring-Your-Own-Model Approach for ML-Driven Storage Placement in Warehouse-Scale Computers

AdaParse: An Adaptive Parallel PDF Parsing and Resource Scaling Engine

AIOpsLab: A Holistic Framework to Evaluate AI Agents for Enabling Autonomous Clouds

APOLLO: SGD-like Memory, AdamW-level Performance

Balancing Pipeline Parallelism with Vocabulary Parallelism

COMET: Fine-grained Computation-communication Overlapping for Mixture-of-Experts

Context Parallelism for Scalable Million-Token Inference

DiffServe: Efficiently Serving Text-to-Image Diffusion Models with Query-Aware Model Scaling

Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking

Efficient On-Device Machine Learning with a Biologically-Plausible Forward-Only Algorithm

Enabling Unstructured Sparse Acceleration on Structured Sparse Accelerators

FastTree: Optimizing Attention Kernel and Runtime for Tree-Structured LLM Inference

FedProphet: Memory-Efficient Federated Adversarial Training via Robust and Consistent Cascade Learning

FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving

FlexAttention: A Programming Model for Generating Fused Attention Variants.

FlexInfer: Flexible LLM Inference with CPU Computations

FLStore: Efficient Federated Learning Storage for non-training workloads

Graph Learning at Scale: Characterizing and Optimizing Pre-Propagation GNNs

GSplit: Scaling Graph Neural Network Training on Large Graphs via Split-Parallelism

HyC-LoRA: Memory Efficient LoRA Fine-tuning with Hybrid Activation Compression

Interference-aware Edge Runtime Prediction with Conformal Matrix Completion

Know Where You’re Uncertain When Planning with Multimodal Foundation Models: A Formal Framework

LAVA: Lifetime-Aware VM Allocation with Learned Distributions and Adaptation to Mispredictions

LeanAttention: Hardware-Aware Scalable Attention Mechanism for the Decode-Phase of Transformers

Lightweight Software Kernels and Hardware Extensions for Efficient Sparse Deep Neural Networks on Microcontrollers

LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention

Lumos: Efficient Performance Modeling and Estimation for Large-scale LLM Training

Marconi: Prefix Caching for the Era of Hybrid LLMs

MAS-ATTENTION: MEMORY-AWARE STREAM PROCESSING FOR ATTENTION ACCELERATION ON RESOURCE-CONSTRAINED EDGE DEVICES

MEADOW: Memory-efficient Dataflow and Data Packing for Low Power Edge LLMs

MiLo: Efficient Quantized MoE Inference with Mixture of Low-Rank Compensators

NEO: Saving GPU Memory Crisis with CPU Offloading for Online LLM Inference

On Distributed Larger-Than-Memory Subset Selection With Pairwise Submodular Functions

Optimizing LLM Queries in Relational Data Analytics Workloads

Photon: Federated LLM Pre-Training

PipeFill: Using GPUs During Bubbles in Pipeline-parallel LLM Training

ProtoRAIL: A Risk-cognizant Imitation Agent for Adaptive vCPU Oversubscription In the Cloud

QServe:W4A8KV4 Quantization and System Co-design for Efficient LLM Serving

Radius: Range-based Gradient Sparsity for Large Foundation Model Pre-training

ReaL: Efficient RLHF Training of Large Language Models with Parameter Reallocation

Rethinking Key-Value Cache Compression Techniques for Large Language Model Serving

Rubick: Exploiting Job Reconfigurability for Deep Learning Cluster Scheduling

SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention

ScaleFusion: Scalable Inference of Spatial-Temporal Diffusion Transformers for High-Resolution Long Video Generation

Scaling Deep Learning Training with MPMD Pipeline Parallelism

Seesaw: High-throughput LLM Inference via Model Re-sharding

Self-Data Distillation for Recovering Quality in Pruned Large Language Models

SOLA: Optimizing SLO Attainment for Large Language Model Serving with State-Aware Scheduling

SparseTransX: Efficient Training of Translation-Based Knowledge Graph Embeddings Using Sparse Matrix Operations

Supply-Chain Attacks in Machine Learning Frameworks

SwiftVI: Time-Efficient Planning and Learning with MDPs

The Hidden Bloat in Machine Learning Systems

ThunderServe: High-performance and Cost-efficient LLM Serving in Cloud Environments

TileLink: Generating Efficient Compute-Communication Overlapping Kernels using Tile-Centric Primitives

Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer

TurboAttention: Efficient attention approximation for high throughputs llm

Venn: Resource Management For Collaborative Learning Jobs

VoLUT: Efficient Volumetric streaming enhanced by LUT-based super-resolution

XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models

Youmu: Efficient Columnar Data Pipeline for LLM Training

Poster Session and Reception - Young Professional Symposium

Lessons Learned from Successful PhD Students

LMArena: An Open Platform for Crowdsourced AI benchmarks

Designing Models from the Hardware Up

YPS - Talk by Beidi Chen