Skip to yearly menu bar Skip to main content


Oral

Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding

Yilong Zhao · Jiaming Tang · Kan Zhu · Zihao Ye · Chi-Chih Chang · Chaofan Lin · Jongseok Park · Guangxuan Xiao · Mohamed Abdelfattah · Mingyu Gao · Baris Kasikci · Song Han · Ion Stoica

Abstract

Chat is not available.