Federated Learning (FL) has recently emerged as the overarching framework for distributed machine learning (ML) beyond data centers. FL, in both cross-device and cross-silo, enables collaborative ML model training from originally isolated data without sacrificing data privacy. Such potential use of FL has since then attracted an explosive attention from the ML, computer systems, optimization, signal processing, wireless networking, data mining, computer architecture, privacy and security communities.
FL-related research is penetrating into almost every science and engineering discipline. However, as FL comes closer to being deployable in real-world systems, many open problems in FL today cannot be solved solely by researchers in one community. For example, designing most effcient and reliable FL algorithms require leveraging expertises from systems, security, signal processing and networking communities. On the other hand, designing most effcient and scalable computing and networking systems require leveraging collaborative advances from ML, data mining, and optimization communities.
In light of the differences in education backgrounds, toolboxes, viewpoints, and design principles of different communities, this workshop aims to break community barriers and bring researchers from pertinent communities together to address open problems in FL. More importantly, this workshop aims to stimulate discussion among experts in different fields (e.g., industry and academia) and identify new problems that remain underexplored from an interdisciplinary perspective.
Thu 8:55 a.m. - 9:00 a.m.
|
Opening Remarks
|
🔗 |
Thu 9:00 a.m. - 9:40 a.m.
|
ProxSkip: Yes! Local Gradient Steps Provably Lead to Communication Acceleration! Finally!
(
Invited Talk
)
We introduce ProxSkip -- a surprisingly simple and provably efficient method for minimizing the sum of a smooth ($f$) and an expensive nonsmooth proximable ($\psi$) function. The canonical approach to solving such problems is via the proximal gradient descent (ProxGD) algorithm, which is based on the evaluation of the gradient of $f$ and the prox operator of $\psi$ in each iteration. In this work we are specifically interested in the regime in which the evaluation of prox is costly relative to the evaluation of the gradient, which is the case in many applications. ProxSkip allows for the expensive prox operator to be skipped in most iterations: while its iteration complexity is $O(\kappa \log \frac{1}{\varepsilon})$, where $\kappa$ is the condition number of $f$, the number of prox evaluations is $O(\sqrt{\kappa} \log \frac{1}{\varepsilon})$ only. Our main motivation comes from federated learning, where evaluation of the gradient operator corresponds to taking a local GD step independently on all devices, and evaluation of prox corresponds to (expensive) communication in the form of gradient averaging. In this context, ProxSkip offers an effective acceleration of communication complexity. Unlike other local gradient-type methods, such as FedAvg, SCAFFOLD, S-Local-GD and FedLin, whose theoretical communication complexity is worse than, or at best matching, that of vanilla GD in the heterogeneous data regime, we obtain a provable and large improvement without any heterogeneity-bounding assumptions.
|
Peter Richtarik 🔗 |
Thu 9:40 a.m. - 10:20 a.m.
|
Three daunting challenges of federated learning: privacy leakage, label deficiency, and resource constraints
(
Invited Talk
)
Federated learning (FL) has emerged as a promising approach to enable decentralized machine learning directly at the edge, in order to enhance users’ privacy, comply with regulations, and reduce development costs. In this talk, I will provide an overview of FL and highlight three fundamental challenges for landing FL into practice: (1) privacy and security guarantees for FL; (2) label scarcity at the edge; and (3) FL over resource-constrained edge nodes. I will also provide a brief overview of FedML (https://fedml.ai), which is a platform that enables zero-code, lightweight, cross-platform, and provably secure federated learning and analytics. |
Salman Avestimehr 🔗 |
Thu 10:20 a.m. - 11:00 a.m.
|
Federated Learning for EdgeAI: New Ideas and Opportunities for Progress
(
Invited Talk
)
EdgeAI aims at the widespread deployment of AI on edge devices. To this end, a critical requirement of future ML systems is to enable on-device automated training and inference in distributed settings, wherever and whenever data, devices, or users are present, without sending the training (possibly sensitive) data to the cloud or incurring long response times. Starting from these overarching considerations, we consider on-device distributed learning, the hardware it runs on, and their co-design to allow for efficient federated learning and resource-aware deployment on edge devices. We hope to convey the excitement of working in this problem space that brings together topics in ML, optimization, communications, and application-hardware (co-)design. |
Radu Marculescu 🔗 |
Thu 11:00 a.m. - 11:40 a.m.
|
Model Based Deep Learning with Applications to Federated Learning
(
Invited Talk
)
Deep neural networks provide unprecedented performance gains in many real-world problems in signal and image processing. Despite these gains, the future development and practical deployment of deep networks are hindered by their black-box nature, i.e., a lack of interpretability and the need for very large training sets. On the other hand, signal processing and communications have traditionally relied on classical statistical modeling techniques that utilize mathematical formulations representing the underlying physics, prior information and additional domain knowledge. Simple classical models are useful but sensitive to inaccuracies and may lead to poor performance when real systems display complex or dynamic behavior. Here we introduce various approaches to model based learning which merge parametric models with optimization tools and classical algorithms leading to efficient, interpretable networks from reasonably sized training sets. We then show how model based signal processing can impact federated learning both in terms of communication efficiency and in terms of convergence properties. We will consider examples to image deblurring, super resolution in ultrasound and microscopy, efficient communication systems, and efficient diagnosis of COVID19 using X-ray and ultrasound. |
🔗 |
Thu 11:40 a.m. - 1:40 p.m.
|
Live demo session on FedML
(
Demo
)
A tutorial followed by a live demo (an interactive session with participants to run FL in our platform) |
🔗 |
Thu 1:40 p.m. - 2:20 p.m.
|
Invited Talk
|
Yiran Chen 🔗 |
Thu 2:20 p.m. - 3:00 p.m.
|
On Lower Bounds of Distributed Learning with Communication Compression
(
Invited Talk
)
There have been many recent works proposing new compressors for various distributed optimization settings. But, all cutting-edge performance analyses come down to one of the only two properties of compressors: unbiasedness or contraction. This leads to a natural question: If we want to improve the convergence rate of distributed optimization with communication compression, should we continue using those properties and focus on how to apply them more cleverly in distributed algorithms, or should we look for new compressor properties? To answer this question, we present theoretical performance lower bounds imposed by those two properties and, then, show that the lower bounds are nearly matched by a method, which works with any compressors satisfying one of those two properties. Hence, future work shall look for a fundamentally new compressor property. This is joint work with Xinmeng Huang (UPenn), Yiming Chen (Alibaba), and Kun Yuan (Alibaba). |
Wotao Yin 🔗 |
Thu 3:00 p.m. - 5:00 p.m.
|
Poster session and best student poster competition
(
Poster Session
)
At Gather Town |
Nathalie Baracaldo Angel · Tianyi Chen · Carlee Joe-Wong 🔗 |
Thu 5:00 p.m. - 5:05 p.m.
|
Closing Remarks
|
🔗 |