Contributed 1
in
Workshop: 2nd On-Device Intelligence Workshop
A Flexible, Extensible Software Framework for Model Compression Based on the LC Algorithm (Yerlan Idelbayev, University of California, Merced)
Compression of the neural network models has become an important systems problem for practical machine learning workflows. While various compression mechanisms and algorithms have been proposed to address the issue, many solutions rely on highly specialized procedures and require substantial domain knowledge to use efficiently. To streamline the compression to a large body of users, we propose an extensible open-source library based on the ideas of learning-compression (LC) algorithm—the LC toolkit. The software is written in Python using Pytorch and currently supports multiple forms of pruning, quantization, and low-rank compressions that can be applied to the model’s parts individually or in combination to reduce model’s size, computational requirements, or the on-device inference time. The toolkit’s versatility comes from the separation of the model learning from the model compression in the LC algorithm: once the learning (L) step is given, any compression (C) steps can be used for the model.