Skip to yearly menu bar Skip to main content


(3 events)   Timezone:  
Show all
Toggle Poster Visibility
Poster
Wed May 15 09:00 AM -- 09:20 AM (PDT) @ Poster Position Number 33
FlashDecoding++: Faster Large Language Model Inference with Asynchronization, Flat GEMM Optimization, and Heuristics
Ke Hong · Guohao Dai · Jiaming Xu · Qiuli Mao · Xiuhong Li · Jun Liu · kangdi chen · Yuhan Dong · Yu Wang
Poster
Wed May 15 09:20 AM -- 09:40 AM (PDT) @ Poster Position Number 25
Prompt Cache: Modular Attention Reuse for Low-Latency Inference
In Gim · Guojun Chen · Seung-seob Lee · Nikhil Sarda · Anurag Khandelwal · Lin Zhong
Poster
Wed May 15 09:40 AM -- 10:00 AM (PDT) @ Poster Position Number 22
Keyformer: KV Cache reduction through key tokens selection for Efficient Generative Inference
Muhammad Adnan · Akhil Arunkumar · Gaurav Jain · Prashant Nair · Ilya Soloveychik · Purushotham Kamath
[ Slides