Skip to yearly menu bar Skip to main content


Oral Thu, May 21, 2026 • 9:30 AM – 9:45 AM PDT

Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding

Yilong Zhao ⋅ Jiaming Tang ⋅ Kan Zhu ⋅ Zihao Ye ⋅ Chi-Chih Chang ⋅ Chaofan Lin ⋅ Jongseok Park ⋅ Guangxuan Xiao ⋅ Mohamed Abdelfattah ⋅ Mingyu Gao ⋅ Baris Kasikci ⋅ Song Han ⋅ Ion Stoica

Abstract

Log in and register to view live content