Abstract
Data augmentation is a widely used trick when training deep neural networks:
in addition to the original data, properly transformed data are also added to
the training set. However, to the best of our knowledge, a clear mathematical
framework to explain the performance benefits of data augmentation is not
available.
In this paper, we develop such a theoretical framework. We show data
augmentation is equivalent to an averaging operation over the orbits of a
certain group that keeps the data distribution approximately invariant. We
prove that it leads to variance reduction. We study empirical risk
minimization, and the examples of exponential families, linear regression, and
certain two-layer neural networks. We also discuss how data augmentation could
be used in problems with symmetry where other approaches are prevalent, such as
in cryo-electron microscopy (cryo-EM).
Users
Please
log in to take part in the discussion (add own reviews or comments).