Keywords: [ hardware-efficient ml ] [ efficient training ]
This paper proposes gyro dropout, a variant of dropout that improves the efficiency of training neural net-works. Instead of randomly dropping out neurons in every training iteration, gyro dropout pre-selects and trains a fixed number of subnetworks. Because each subnetwork is more stably trained, they are more diversified and thus their ensemble achieves good generalization. We further propose block-wise gyro dropout, or simply block-wise dropout, which is a GPU-friendly variant of gyro dropout. Block-wise dropout partitions hidden neurons into a number of groups that should be dropped out together throughout learning; this makes it efficient to prune the corresponding warp executions on GPUs. We evaluate the two dropout methods with seven neural networks and ten public datasets. In our evaluation, gyro dropout improves the accuracy of trained models by up to 1.93%; gyro dropout consistently achieves higher accuracy than conventional dropout in all experiments. Moreover, block-wise dropout speeds up the training of neural networks by up to 29.8% with little to no accuracy loss. Ourimplementation of gyro dropout is publicly available at https://github.com/mlsys-seo/gyro-dropout.