Skip to yearly menu bar Skip to main content


Poster

FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving

Zihao Ye · Lequn Chen · Ruihang Lai · Wuwei Lin · Yineng Zhang · Stephanie Wang · Tianqi Chen · Baris Kasikci · Vinod Grover · Arvind Krishnamurthy · Luis Ceze
Outstanding Paper Award Outstanding Paper Award

Abstract

Video

Chat is not available.