Timezone: »
Deep learning on point clouds has received increased attention thanks to its wide applications in AR/VR and autonomous driving. These applications require low latency and high accuracy to provide real-time user experience and ensure user safety. Unlike conventional dense workloads, the sparse and irregular nature of point clouds poses severe challenges to running sparse CNNs efficiently on the general-purpose hardware, and existing sparse acceleration techniques for 2D images do not translate to 3D point clouds. In this paper, we introduce TorchSparse, a high-performance point cloud inference engine that accelerates the sparse convolution computation on GPUs. TorchSparse directly optimizes the two bottlenecks of sparse convolution: irregular computation and data movement. It adopts adaptive MM grouping to trade computation for better regularity, achieving 1.4-1.5x speedup for matrix multiplication. It also optimizes the data movement by adopting vectorized, quantized and fused locality-aware memory access, reducing the memory movement cost by 2.7x. Evaluated on seven representative models across three benchmark datasets, TorchSparse achieves 1.6x and 1.5x measured end-to-end speedup over the state-of-the-art MinkowskiEngine and SpConv, respectively.
Author Information
Haotian Tang (MIT)
Zhijian Liu (MIT)
Xiuyu Li (Cornell University)
Yujun Lin (MIT)
Song Han (MIT)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Oral: TorchSparse: Efficient Point Cloud Inference Engine »
Mon. Aug 29th 11:18 -- 11:36 PM Room Exhibit Hall A
More from the Same Authors
-
2021 Poster: IOS: Inter-Operator Scheduler for CNN Acceleration »
Yaoyao Ding · Ligeng Zhu · Zhihao Jia · Gennady Pekhimenko · Song Han -
2021 Oral: IOS: Inter-Operator Scheduler for CNN Acceleration »
Yaoyao Ding · Ligeng Zhu · Zhihao Jia · Gennady Pekhimenko · Song Han