Skip to yearly menu bar Skip to main content


Poster

QServe:W4A8KV4 Quantization and System Co-design for Efficient LLM Serving

Yujun Lin · Haotian Tang · Shang Yang · Zhekai Zhang · Guangxuan Xiao · Chuang Gan · Song Han

Abstract

Video

Chat is not available.