Abstract
We introduce Bayesian Bits, a practical method for joint mixed precision
quantization and pruning through gradient based optimization. Bayesian Bits
employs a novel decomposition of the quantization operation, which sequentially
considers doubling the bit width. At each new bit width, the residual error
between the full precision value and the previously rounded value is quantized.
We then decide whether or not to add this quantized residual error for a higher
effective bit width and lower quantization noise. By starting with a
power-of-two bit width, this decomposition will always produce
hardware-friendly configurations, and through an additional 0-bit option,
serves as a unified view of pruning and quantization. Bayesian Bits then
introduces learnable stochastic gates, which collectively control the bit width
of the given tensor. As a result, we can obtain low bit solutions by performing
approximate inference over the gates, with prior distributions that encourage
most of them to be switched off. We further show that, under some assumptions,
L0 regularization of the network parameters corresponds to a specific instance
of the aforementioned framework. We experimentally validate our proposed method
on several benchmark datasets and show that we can learn pruned, mixed
precision networks that provide a better trade-off between accuracy and
efficiency than their static bit width equivalents.
Description
[2005.07093] Bayesian Bits: Unifying Quantization and Pruning
Links and resources
Tags
community