Registration Desk: Registration and Check-in Wed 14 May 08:00 a.m.
Poster: Session 5: LLM training and fine-tuning Wed 14 May 08:30 a.m.
[ Mission City Ballroom ]
Abstract
[ Mission City Ballroom ]
Abstract
[ Mission City Ballroom ]
Abstract
[ Mission City Ballroom ]
Abstract
[ Mission City Ballroom ]
Abstract
[ Mission City Ballroom ]
Abstract
Invited Talk: Animashree Anandkumar
Hardware-aware training and inference for large-scale AI
The scaling of large language models has led to impressive gains in language understanding, but at a cost of insatiable memory and bandwidth requirements. We take a principled approach of designing optimization and quantization algorithms that can reduce memory requirements without sacrificing accuracy. This includes gradient compression methods (GaLore, SignSGD) and logarithmic number system for representation. We also design fine-grained memory reduction schemes such as KV cache compression, chunking and offloading to overcome memory bottlenecks in language models, especially in the reasoning mode where current memory requirements are massive. Such principles are broadly applicable and especially relevant to physical AI where the memory and bandwidth requirements are even greater than frontier LLMs.
Bio :
Poster: Session 6: Edge and Cloud Systems Wed 14 May 01:15 p.m.
[ Mission City Ballroom ]
Abstract
[ Mission City Ballroom ]

Abstract
[ Mission City Ballroom ]
Abstract
[ Mission City Ballroom ]

Abstract
[ Mission City Ballroom ]
Abstract
Poster: Session 7: Quantization and Sparsity Wed 14 May 02:40 p.m.
[ Mission City Ballroom ]
Abstract
[ Mission City Ballroom ]
Abstract
[ Mission City Ballroom ]
Abstract
[ Mission City Ballroom ]
Abstract
[ Mission City Ballroom ]
Abstract
Poster: Session 8: LLM and Diffusion Model Serving Wed 14 May 04:30 p.m.
[ Mission City Ballroom ]
Abstract
[ Mission City Ballroom ]
Abstract
[ Mission City Ballroom ]
Abstract
[ Mission City Ballroom ]
Abstract
[ Mission City Ballroom ]