Skip to yearly menu bar Skip to main content


Poster

Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding

Yilong Zhao ⋅ Jiaming Tang ⋅ Kan Zhu ⋅ Zihao Ye ⋅ Chi-Chih Chang ⋅ Chaofan Lin ⋅ Jongseok Park ⋅ Guangxuan Xiao ⋅ Mohamed Abdelfattah ⋅ Mingyu Gao ⋅ Baris Kasikci ⋅ Song Han ⋅ Ion Stoica

Abstract

Chat is not available.