Skip to yearly menu bar Skip to main content


Oral Thu, May 21, 2026 • 3:30 PM – 3:45 PM PDT

GUARD: SCALABLE STRAGGLER DETECTION AND NODE HEALTH MANAGEMENT FOR LARGE-SCALE TRAINING

guanliang liu ⋅ Abhinandan Patni ⋅ congzhu lin ⋅ Zoe Zeng ⋅ Jack Wittmayer ⋅ Yinghong Liu ⋅ josh wu ⋅ Anthony Ko ⋅ Alexander Zhipa ⋅ Ashvin Nihalani ⋅ Binxuan Huang ⋅ Cong Cheng ⋅ Mi Sun ⋅ Vijay rajakumar ⋅ Rejith Joseph ⋅ Parthasarathy Govindarajen

Abstract

Log in and register to view live content