Skip to yearly menu bar Skip to main content


Poster

FastTree: Optimizing Attention Kernel and Runtime for Tree-Structured LLM Inference

Zaifeng Pan ⋅ Yitong Ding ⋅ Yue Guan ⋅ Zheng Wang ⋅ Zhongkai Yu ⋅ Xulong Tang ⋅ Yida Wang ⋅ Yufei Ding

Abstract

Video

Chat is not available.