Workshop on Decentralized and Collaborative Learning

Workshop

Workshop on Decentralized and Collaborative Learning

Binhang Yuan · Beidi Chen · Virginia Smith · Ce Zhang · Christopher Re

Thu 8 Jun, 5:45 a.m. PDT

[ Abstract ]

Machine learning models, especially large language models such as GPT-3 and generative models for image
synthesis tasks such as Stable Diffusion, are primarily trained in a centralized data center today, with thousands of
GPUs lasting for weeks, if not months. The inference process of these models is also not cheap — given their
staggering size, these models are also often served with expensive cutting-edge GPUs hosted in a centralized data
center. Such a centralized paradigm is not only expensive but also greatly limits the accessibility to the rest of the
research community. Inspired by the great success of volunteer computing and federated learning projects such as
SETI@Home, Folding@Home, and FedML, making machine learning decentralized and collaborative can be a
promising alternative to this centralized paradigm. If we could exploit globally geo-distributed GPUs/edge devices
that are under-utilized, we would share one of the most powerful “supercomputers” in the world and potentially use
them for the next generation of open models!
In recent years, there has been significant progress in decentralized and collaborative learning. This includes new
theoretical and algorithmic developments (e.g., [1, 2, 3, 4]), and practical deployments including Training
Transformer Together [5] and Petals [6]. Together with recent advancements in cryptography, secure computation,
and blockchain technology, we see a path to realizing this decentralized vision for machine learning!
However, there are still many challenging technical problems in front of us, including (1) efficient training and
inference over slow networks, (2) practical verification over untrusted devices, (3) providing privacy and security
guarantees, (4) developing incentive mechanisms, and (5) real-world deployment on blockchains. Tackling these
challenges requires expertise and collaboration from many different communities, not only from machine learning,
systems, security, and privacy, but also from economics, blockchain, and Web3. This workshop aims to bring
leading experts from these different communities together, discuss how these areas can come together to enable a
decentralized learning paradigm, and lay out important directions for future work and concrete cross-community
collaborations.
The topic of this workshop includes but is not limited to:
● New algorithms for decentralized and collaborative learning
● Communication-efficient learning algorithms
● System design and optimizations for decentralized learning
● Learning over untrusted and potentially malicious devices
● Verification of computation in the context of decentralized learning
● Mechanism design and implementation for decentralized learning
● Incentive schemes, e.g., environment, economy, accessibility, for collaborative learning
● Security and privacy for decentralized learning
● Blockchain and Web3 technology for decentralized learning

Chat is not available.

Timezone: America/Los_Angeles

Schedule

Thu 5:55 a.m. - 6:00 a.m.	Opening Remarks ( Opening Remarks ) > SlidesLive Video	🔗
Thu 6:00 a.m. - 6:40 a.m.	Building Machine Learning Models like Open-Source Software with git-theta [Colin Raffel & Nikhil Kandpal] ( Invited Talk ) > SlidesLive Video	Nikhil Kandpal 🔗
Thu 6:40 a.m. - 7:20 a.m.	Contribution and Fairness-Aware Federated Learning [Han Yu] ( Invited Talk ) > link SlidesLive Video Link	🔗
Thu 7:40 a.m. - 8:20 a.m.	Security and Robustness of Collaborative Learning Systems [Anwar Hithnawi] ( Invited Talk ) > SlidesLive Video	Anwar Hithnawi 🔗
Thu 8:20 a.m. - 9:00 a.m.	Poisoning Web-Scale Training Datasets is Practical [Florian Tamer] ( Invited Talk ) > SlidesLive Video	🔗
Thu 11:00 a.m. - 11:40 a.m.	Example Selection for Distributed Learning [Chris De Sa] ( Invited Talk ) > SlidesLive Video	Christopher De Sa 🔗
Thu 11:40 a.m. - 12:20 p.m.	DataComp: In search of the next generation of multimodal datasets [Ludwig Schmidt] ( Invited Talk ) > SlidesLive Video	🔗
Thu 12:40 p.m. - 1:20 p.m.	Accommodating LLM training over decentralized computational resources [Binhang Yuan] ( Invited Talk ) > SlidesLive Video	Binhang Yuan 🔗