Sparsity-Aware Memory Interface Architecture using Stacked XORNet Compression for Accelerating Pruned-DNN Models

Younghoon Byun · Seungsik Moon · Baeseong Park · Se Jung Kwon · Dongsoo Lee · Gunho Park · Eunji Yoo · Jung Gyu Min · Youngjoo Lee

Ballroom B - Position 22


This paper presents a new algorithm-hardware co-optimization approach that maximizes memory bandwidth utilization even for the pruned deep neural network (DNN) models.Targeting the well-known model compression approaches, for the first time, we carefully investigate the memory interface overheads caused by the irregular data accessing patterns.Then, the sparsity-aware memory interface architecture is newly developed to regularly access all the data of pruned-DNN models stored with the state-of-the-art XORNet compression.Moreover, we introduce the novel stacked XORNet solution for minimizing the number of data imbalances, remarkably relaxing the interface costs without slowing the effective memory bandwidth.As a result, experimental results show that our co-optimized interface architecture can achieve almost the ideal model-accessing speed with reasonable hardware overheads, successfully allowing the high-speed pruned-DNN inference scenarios.

Chat is not available.