Oral

Apollo: Automatic Partition-based Operator Fusion through Layer by Layer Optimization

Jie Zhao · Xiong Gao · Ruijie Xia · Zhaochuang Zhang · Deshi Chen · Lei Chen · Renwei Zhang · Zhen Geng · Bin Cheng · Xuefeng Jin

Exhibit Hall A
[ Abstract ] [ Livestream: Visit ML Compilers & Runtime ]
Mon 29 Aug 2:33 p.m. — 2:51 p.m. PDT
[ Paper PDF [ Slides

We study fusion for deep neural networks (DNNs) in a just-in-time (JIT) compilation framework Apollo. It considers both memory- and compute-bound tensor operators for fusion, and integrates graph-level node grouping and operator-level loop fusion closely, widening the fusion search space. Apollo enables the upward feedback from the downstream loop optimizer, enforcing the graph engine to regenerate partition patterns amenable to the downstream pass and thus resolving the scalability issue. Besides data locality, Apollo also exploits the parallelism between independent tensor operators, further improving the performance of DNN workloads. Experimental results on training workloads show that Apollo outperforms TensorFlow and XLA by 1.86× and 1.37× on a single GPU, and 1.96× and 1.18× on multiple GPUs. Apollo also improves the performance of a vendor-provided DNN framework by 19.7% on a domain-specific accelerator. In addition, the results of inference workloads demonstrate the general applicability of our fusion framework.

Chat is not available.