ProxSkip: Yes! Local Gradient Steps Provably Lead to Communication Acceleration! Finally!
Peter Richtarik
2022 Invited Talk
in
Workshop: Cross-Community Federated Learning: Algorithms, Systems and Co-designs
in
Workshop: Cross-Community Federated Learning: Algorithms, Systems and Co-designs
Abstract
We introduce ProxSkip -- a surprisingly simple and provably efficient method for minimizing the sum of a smooth ($f$) and an expensive nonsmooth proximable ($\psi$) function. The canonical approach to solving such problems is via the proximal gradient descent (ProxGD) algorithm, which is based on the evaluation of the gradient of $f$ and the prox operator of $\psi$ in each iteration. In this work we are specifically interested in the regime in which the evaluation of prox is costly relative to the evaluation of the gradient, which is the case in many applications. ProxSkip allows for the expensive prox operator to be skipped in most iterations: while its iteration complexity is $O(\kappa \log \frac{1}{\varepsilon})$, where $\kappa$ is the condition number of $f$, the number of prox evaluations is $O(\sqrt{\kappa} \log \frac{1}{\varepsilon})$ only. Our main motivation comes from federated learning, where evaluation of the gradient operator corresponds to taking a local GD step independently on all devices, and evaluation of prox corresponds to (expensive) communication in the form of gradient averaging. In this context, ProxSkip offers an effective acceleration of communication complexity. Unlike other local gradient-type methods, such as FedAvg, SCAFFOLD, S-Local-GD and FedLin, whose theoretical communication complexity is worse than, or at best matching, that of vanilla GD in the heterogeneous data regime, we obtain a provable and large improvement without any heterogeneity-bounding assumptions.
Speaker
Peter Richtarik
Peter Richtárik is a professor of Computer Science at the King Abdullah University of Science and Technology (KAUST), Saudi Arabia, where he leads the Optimization and Machine Learning Lab. His research interests lie at the intersection of mathematics, computer science, machine learning, optimization, numerical linear algebra, and high-performance computing. Through his work on randomized and distributed optimization algorithms, he has contributed to the foundations of machine learning, optimization and randomized numerical linear algebra. He is one of the original developers of Federated Learning. Prof Richtárik’s works attracted international awards, including a Best Paper Award at the NeurIPS 2020 Workshop on Scalability, Privacy, and Security in Federated Learning, Distinguished Speaker Award at the 2019 International Conference on Continuous Optimization, SIAM SIGEST Best Paper Award, and the IMA Leslie Fox Prize (three times). Several of his works are among the most read papers published by the SIAM Journal on Optimization and the SIAM Journal on Matrix Analysis and Applications. Prof Richtárik serves as an Area Chair for leading machine learning conferences, including NeurIPS, ICML and ICLR, and is an Action Editor of Transactions of Machine Learning Research, and Associate Editor of Optimization Methods and Software.
Chat is not available.
Successful Page Load