Skip to yearly menu bar Skip to main content


Oral

MTraining: Distributed Dynamic Sparse Attention for Efficient Ultra-Long Context Training

Wenxuan Li ⋅ Chengruidong Zhang ⋅ Huiqiang Jiang ⋅ Yucheng Li ⋅ ⋅ Lili Qiu

Abstract

Chat is not available.