Abstract
Low-precision computation is often used to lower the time and energy cost of
machine learning, and recently hardware accelerators have been developed to
support it. Still, it has been used primarily for inference - not training.
Previous low-precision training algorithms suffered from a fundamental
tradeoff: as the number of bits of precision is lowered, quantization noise is
added to the model, which limits statistical accuracy. To address this issue,
we describe a simple low-precision stochastic gradient descent variant called
HALP. HALP converges at the same theoretical rate as full-precision algorithms
despite the noise introduced by using low precision throughout execution. The
key idea is to use SVRG to reduce gradient variance, and to combine this with a
novel technique called bit centering to reduce quantization error. We show that
on the CPU, HALP can run up to $4 \times$ faster than full-precision SVRG and
can match its convergence trajectory. We implemented HALP in TensorQuant, and
show that it exceeds the validation performance of plain low-precision SGD on
two deep learning tasks.
Users
Please
log in to take part in the discussion (add own reviews or comments).