Abstract
In a variety of problems originating in supervised, unsupervised, and
reinforcement learning, the loss function is defined by an expectation over a
collection of random variables, which might be part of a probabilistic model or
the external world. Estimating the gradient of this loss function, using
samples, lies at the core of gradient-based learning algorithms for these
problems. We introduce the formalism of stochastic computation
graphs---directed acyclic graphs that include both deterministic functions and
conditional probability distributions---and describe how to easily and
automatically derive an unbiased estimator of the loss function's gradient. The
resulting algorithm for computing the gradient estimator is a simple
modification of the standard backpropagation algorithm. The generic scheme we
propose unifies estimators derived in variety of prior work, along with
variance-reduction techniques therein. It could assist researchers in
developing intricate models involving a combination of stochastic and
deterministic operations, enabling, for example, attention, memory, and control
actions.
Description
[1506.05254] Gradient Estimation Using Stochastic Computation Graphs
Links and resources
Tags
community