Oral Tue, May 19, 2026 • 5:45 PM – 6:00 PM PDT

Zero redundancy distributed learning with differential privacy

Zhiqi Bu ⋅ Justin Chiu ⋅ Ruixuan Liu ⋅ Sheng Zha ⋅ George Karypis

[ Slides] [ OpenReview]

Abstract

Deep learning using large models has achieved great success in a wide range of domains. However, training these models on billions of parameters is very challenging in terms of training speed, memory cost, and communication efficiency, especially under the privacy-preserving regime with differential privacy (DP). On the one hand, the efficiency of DP optimization is comparable to that of standard non-DP optimization on a single GPU, but existing DP distributed learning is significantly inefficient on multiple GPUs. On the other hand, the Zero Redundancy Optimizer (ZeRO) is a state-of-the-art solution to the standard distributed learning, which can be technically complicated to work compatibly with DP. In this work, we develop a new systematic solution, DP-ZeRO, (I) to scale up the trainable DP model size, e.g. to GPT-100B, (II) to obtain the same computation and communication efficiency as the standard ZeRO, and (III) to enable mixed-precision DP training. Our DP-ZeRO, like the standard ZeRO, has the potential to train models with arbitrary size and exhibits excellent training efficiency on large models. Code at \url{https://github.com/awslabs/fast-differential-privacy}.

Chat is not available.