MLSys Poster Synthesizing Optimal Parallelism Placement and Reduction Strategies on Hierarchical Systems for Deep Learning

Poster

Synthesizing Optimal Parallelism Placement and Reduction Strategies on Hierarchical Systems for Deep Learning

Ningning Xie · Tamara Norman · Dominik Grewe · Dimitrios Vytiniotis

Keywords: [ distributed and parallel learning ] [ programming languages for ml ]

[ Abstract ]

[ Paper PDF]

Abstract:

We present a novel characterization of the mapping of multiple parallelism forms(e.g. data and model parallelism) onto hierarchical accelerator systems that ishierarchy-aware and greatly reduces the space of software-to-hardware mapping.We experimentally verify the substantial effect of these mappings on all-reduceperformance (up to 448x). We offer a novel syntax-guided programsynthesis framework that is able to decompose reductions over one or moreparallelism axes to sequences of collectives in a hierarchy- and mapping-awareway. For 69% of parallelism placements and user requested reductions, ourframework synthesizes programs that outperform the default all-reduceimplementation when evaluated on different GPU hierarchies (max 2.04x,average 1.27x). We complement our synthesis tool with a simulatorexceeding 90% top-10 accuracy, which therefore reduces the need for massiveevaluations of synthesis results to determine a small set of optimal programsand mappings.

Chat is not available.