Timezone: »
A rich body of prior work has highlighted the existence of communication bottlenecks in synchronous data-parallel training. To alleviate these bottlenecks, a long line of recent research proposes gradient and model compression methods. In this work, we evaluate the efficacy of gradient compression methods and compare their scalability with optimized implementations of synchronous data-parallel SGD across more than 200 realistic distributed setups. Surprisingly, we observe that only in 6 cases out of more than 200, gradient compression methods provide speedup over optimized synchronous data-parallel training in the typical data-center setting. We conduct an extensive investigation to identify the root causes of this phenomenon, and offer a performance model that can be used to identify the benefits of gradient compression for a variety of system setups. Based on our analysis, we propose a list of desirable properties that gradient compression methods should satisfy, in order for them to provide meaningful utility.
Author Information
Saurabh Agarwal (University of Wisconsin-Madison)
Hongyi Wang (Carnegie Mellon University)
Shivaram Venkataraman (University of Wisconsin, Madison)
Dimitris Papailiopoulos (University of Wisconsin-Madison)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Oral: On the Utility of Gradient Compression in Distributed Training Systems »
Tue. Aug 30th 09:51 -- 10:09 PM Room Exhibit Hall A
More from the Same Authors
-
2023 Workshop: Workshop on Federated Learning Systems »
Dimitris Stripelis · Chaoyang He · Hongyi Wang · Tian Li · Praneeth Vepakomma · Bo Li -
2023 Poster: Cuttlefish: Low-rank Model Training without All The Tuning »
Hongyi Wang · Saurabh Agarwal · Pongsakorn U-chupala · Yoshiki Tanaka · Eric Xing · Dimitris Papailiopoulos -
2021 Poster: Adaptive Gradient Communication via Critical Learning Regime Identification »
Saurabh Agarwal · Hongyi Wang · Kangwook Lee · Shivaram Venkataraman · Dimitris Papailiopoulos -
2021 Oral: Adaptive Gradient Communication via Critical Learning Regime Identification »
Saurabh Agarwal · Hongyi Wang · Kangwook Lee · Shivaram Venkataraman · Dimitris Papailiopoulos -
2021 Poster: Pufferfish: Communication-efficient Models At No Extra Cost »
Hongyi Wang · Saurabh Agarwal · Dimitris Papailiopoulos -
2021 Oral: Pufferfish: Communication-efficient Models At No Extra Cost »
Hongyi Wang · Saurabh Agarwal · Dimitris Papailiopoulos -
2020 Oral: Blink: Fast and Generic Collectives for Distributed ML »
Guanhua Wang · Shivaram Venkataraman · Amar Phanishayee · Nikhil Devanur · Jorgen Thelin · Ion Stoica -
2020 Poster: Blink: Fast and Generic Collectives for Distributed ML »
Guanhua Wang · Shivaram Venkataraman · Amar Phanishayee · Nikhil Devanur · Jorgen Thelin · Ion Stoica