Skip to yearly menu bar Skip to main content


Oral

Kitty: Accurate and Efficient 2-bit KV Cache Quantization with Dynamic Channel-wise Precision Boost

Haojun Xia ⋅ Xiaoxia Wu ⋅ Jisen Li ⋅ Tsai-chuan Wu ⋅ Junxiong Wang ⋅ Jue Wang ⋅ Chenxi Li ⋅ Aman Singhal ⋅ Alay Dilipbhai Shah ⋅ ⋅ Donglin Zhuang ⋅ Zhongzhu Zhou ⋅ Ben Athiwaratkun ⋅ Zhen Zheng ⋅ Shuaiwen Song

Abstract

Chat is not available.