Skip to yearly menu bar Skip to main content


(4 events)   Timezone:  
Show all
Toggle Poster Visibility
Oral
Thu Apr 08 03:20 PM -- 03:40 PM (PDT)
Towards Scalable Distributed Training of Deep Learning on Public Cloud Clusters
Shaohuai Shi · Xianhao Zhou · Shutao Song · Xingyao Wang · Zilin Zhu · Xue Huang · Xinan Jiang · Feihu Zhou · Zhenyu Guo · Liqiang Xie · Rui Lan · Xianbin Ouyang · Yan Zhang · Jieqian Wei · Jing Gong · Weiliang Lin · Ping Gao · Peng Meng · Xiaomin Xu · Chenyang Guo · Bo Yang · Zhibo Chen · Yongjian Wu · Xiaowen Chu
Oral
Thu Apr 08 03:40 PM -- 04:00 PM (PDT)
Understanding and Improving Failure Tolerant Training for Deep Learning Recommendation with Partial Recovery
Kiwan Maeng · Shivam Bharuka · Isabel Gao · Mark Jeffrey · Vikram Saraph · Bor-Yiing Su · Caroline Trippel · Jiyan Yang · Mike Rabbat · Brandon Lucia · Carole-Jean Wu
Oral
Thu Apr 08 04:00 PM -- 04:20 PM (PDT)
Wavelet: Efficient DNN Training with Tick-Tock Scheduling
Guanhua Wang · Kehan Wang · Kenan Jiang · XIANGJUN LI · Ion Stoica
Oral
Thu Apr 08 04:20 PM -- 04:40 PM (PDT)
Pipelined Backpropagation at Scale: Training Large Models without Batches
Atli Kosson · Vitaliy Chiley · Abhinav Venigalla · Joel Hestness · Urs Koster