Skip to yearly menu bar Skip to main content


Poster

Venn: Resource Management Across Federated Learning Jobs

Jiachen Liu · Fan Lai · Eric Ding · Yiwen Zhang · Mosharaf Chowdhury


Abstract:

In recent years, federated learning (FL) has emerged as a promising approach for machine learning (ML) anddata science across distributed edge devices. As the deployment of FL jobs increases, so does resource contention among multiple FL jobs. The ephemeral nature and resource heterogeneity, coupled with the overlapping resource requirements of diverse FL jobs, complicate efficient device scheduling. Existing resource managers for FL jobs opt for random assignment of devices to FL jobs for simplicity and scalability, which hurts job efficiency.In this paper, we present Venn, an FL resource manager, that efficiently schedules contended ephemeral, heterogeneous devices among many FL jobs, with the goal of reducing their average job completion time (JCT). Venn formulates the Intersection Resource Scheduling (IRS) problem to identify complex resource contention among multiple FL jobs. Then, Venn proposes a contention-aware scheduling heuristic to minimize the average scheduling delay. Furthermore, it proposes a resource-aware device-to-job matching heuristic that focuses on optimizing response collection time by mitigating stragglers. Our evaluation shows that, compared to the state-of-the-art FL resource managers, Venn improves the average JCT by up to 1.88×.

Chat is not available.