Skip to yearly menu bar Skip to main content


Contributed 2
in
Workshop: 2nd On-Device Intelligence Workshop

TorchQuant: A Hackable Quantization Library For Researchers, By Reseachers (Shyam A Tailor, University of Cambridge)

Shyam Tailor


Abstract:

Quantization is a popular technique for accelerating and compressing neural networks by utilizing low-bit arithmetic to represent weights and activations. It remains a hot area for research, with continued work on removing the gap in accuracy between full and low precision models. We observe that researchers in this area tend to rely on custom implementations, rather than approaches built into the popular machine learning libraries, as they are not sufficiently flexible to enable research. We are open sourcing TorchQuant, our MIT licensed library that builds upon PyTorch by providing researchers with modular components and implementations that will accelerate their research, and provide the community with consistent baselines. Using our library, we provide an example of how to quickly evaluate a research hypothesis: the “range-precision” trade-off for quantization-aware training. our library can be found at this URL: https://github.com/camlsys/torchquant.