Skip to yearly menu bar Skip to main content


Oral Thu, May 21, 2026 • 3:30 PM – 3:45 PM PDT

NodeSweep: Practical Straggler Detection and Health Monitoring for Large-Scale Foundation Model Training

Guanliang Liu ⋅ Zoe Zeng ⋅ ⋅ Cong Cheng ⋅ ⋅ ⋅ ⋅ ⋅ Abhinandan Patni ⋅ vijay rajakumar ⋅ Alexander Zhipa ⋅ Ashvin Nihalani ⋅ Binxuan Huang ⋅ Parthasarathy Govindarajen ⋅

Abstract

Log in and register to view live content