Shannonic: Efficient Entropy-Optimal Compression for ML Workloads
Kareem Ibrahim · · Andreas Moshovos
Abstract
We present Shannonic, a lossless compression method for machine learning tensors that achieves near-entropy-optimal compression, minimal state footprint, and high throughput. Shannonic uses an off-line pre-processing step to partition the tensor value space into optimally selected subranges and generates encoding/decoding tables that encode each value as a (range index, offset) pair where the range is entropy encoded using the asymmetric numeral systems (ANS) method. We formally prove and empirically show that Shannonic achieves higher compression efficiency than standard ANS. For a variety of 8b-quantized models, Shannonic's codec uses just 530B of state and achieves coding efficiency within 1\% of the Shannon limit. Shannonic enables 1.3-3.1$\times$ faster federated learning over bandwidth-constrained networks and 29-32\% latency reduction in edge-cloud LLM inference.
Chat is not available.
Successful Page Load