Oral Wed, May 20, 2026 • 3:30 PM – 3:45 PM PDT

HetRL: Efficient Reinforcement Learning for LLMs in Heterogeneous Environments

Yongjun He ⋅ Shuai Zhang ⋅ Jiading Gai ⋅ Xiyuan Zhang ⋅ Boran Han ⋅ Bernie Wang ⋅ Huzefa Rangwala ⋅ George Karypis

[ Slides] [ OpenReview]

Abstract

As large language models (LLMs) continue to scale and new GPUs are released even more frequently, there is an increasing demand for LLM post-training in heterogeneous environments to fully leverage underutilized mid-range or previous-generation GPUs and alleviate the shortage of homogeneous high-end GPUs within a single availability zone. However, achieving high-performance reinforcement learning (RL) training for LLMs on such computing resources remains challenging, as the workflow involves multiple models and tasks with complex computational and data dependencies. In this paper, we present HetRL, a distributed system for efficient RL training in infrastructures with heterogeneous GPUs and networks. HetRL formulates RL training scheduling in heterogeneous environments as a constrained joint optimization problem and provides two complementary approaches for addressing this problem: (1) a hybrid scheduling algorithm that efficiently identifies near-optimal solutions, and (2) an integer linear programming (ILP)-based scheduling algorithm that obtains optimal solutions, enabling flexible trade-offs between solution optimality and efficiency. Our extensive evaluation, consuming 20,000 GPU-hours, shows that HetRL achieves up to 9.17$\times$ the throughput of state-of-the-art systems, and 3.17$\times$ on average, across a wide range of workloads and settings.

Chat is not available.