Zusammenfassung
Dataset augmentation, the practice of applying a wide array of
domain-specific transformations to synthetically expand a training set, is a
standard tool in supervised learning. While effective in tasks such as visual
recognition, the set of transformations must be carefully designed,
implemented, and tested for every new domain, limiting its re-use and
generality. In this paper, we adopt a simpler, domain-agnostic approach to
dataset augmentation. We start with existing data points and apply simple
transformations such as adding noise, interpolating, or extrapolating between
them. Our main insight is to perform the transformation not in input space, but
in a learned feature space. A re-kindling of interest in unsupervised
representation learning makes this technique timely and more effective. It is a
simple proposal, but to-date one that has not been tested empirically. Working
in the space of context vectors generated by sequence-to-sequence models, we
demonstrate a technique that is effective for both static and sequential data.
Nutzer