Skip to yearly menu bar Skip to main content


Poster

FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving

Zihao Ye ⋅ Lequn Chen ⋅ Ruihang Lai ⋅ Wuwei Lin ⋅ Yineng Zhang ⋅ Stephanie Wang ⋅ Tianqi Chen ⋅ Baris Kasikci ⋅ Vinod Grover ⋅ Arvind Krishnamurthy ⋅ Luis Ceze
Outstanding Paper Award Outstanding Paper Award

Abstract

Video

Chat is not available.