Skip to yearly menu bar Skip to main content


Poster

QServe:W4A8KV4 Quantization and System Co-design for Efficient LLM Serving

Yujun Lin ⋅ Haotian Tang ⋅ Shang Yang ⋅ Zhekai Zhang ⋅ Guangxuan Xiao ⋅ Chuang Gan ⋅ Song Han

Abstract

Video

Chat is not available.