Skip to yearly menu bar Skip to main content


Poster

GUARD: SCALABLE STRAGGLER DETECTION AND NODE HEALTH MANAGEMENT FOR LARGE-SCALE TRAINING

Guanliang Liu ⋅ Abhinandan Patni ⋅ ⋅ Zoe Zeng ⋅ ⋅ ⋅ ⋅ ⋅ Alexander Zhipa ⋅ Ashvin Nihalani ⋅ Binxuan Huang ⋅ Cong Cheng ⋅ ⋅ vijay rajakumar ⋅ Rejith Joseph ⋅ Parthasarathy Govindarajen

Abstract

Log in and register to view live content