Skip to yearly menu bar Skip to main content


Poster

LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention

Shang Yang ⋅ Junxian Guo ⋅ Haotian Tang ⋅ Qinghao Hu ⋅ Guangxuan Xiao ⋅ Jiaming Tang ⋅ Yujun Lin ⋅ Zhijian Liu ⋅ Yao Lu ⋅ Song Han

Abstract

Video

Chat is not available.