SONAR: Benchmarking Topology and Collaboration in Decentralized Learning
Abstract
The performance, efficiency, and reliability of decentralized machine learning hinge on systems factors such as network topology, communication budget, and device heterogeneity—yet existing frameworks treat these as fixed or opaque. Federated learning remains centrally orchestrated, while peer-to-peer (P2P) approaches lack a unified foundation for analyzing how topology and system design jointly shape learning outcomes. We present \textbf{SONAR}, a systems framework for reproducible, topology-aware decentralized learning. SONAR unifies communication, topology, and telemetry in a layered architecture supporting multiple backends (gRPC, MPI, WebRTC), static and adaptive graphs, and per-node logging of bandwidth, latency, and collaboration dynamics. Using SONAR, we make three observations: (1) topology and its graph-level statistics show no consistent or linear correlation with learning performance across accuracy, robustness, and privacy metrics, underscoring the need to study topology as an independent systems variable; (2) under realistic constraints such as limited communication rounds or bandwidth, topology governs how quickly information propagates—producing up to ≈ 20% performance differences between graph families; and (3) adaptive neighbor selection can induce collaborator collapse—a failure mode where network diversity erodes over time. By exposing topology as a first-class experimental dimension, SONAR enables systematic, reproducible evaluation of decentralized learning across performance, efficiency, and robustness regimes.