Zusammenfassung
Recent advances in Generative Adversarial Networks (GANs) have shown
impressive results for task of facial expression synthesis. The most successful
architecture is StarGAN, that conditions GANs generation process with images of
a specific domain, namely a set of images of persons sharing the same
expression. While effective, this approach can only generate a discrete number
of expressions, determined by the content of the dataset. To address this
limitation, in this paper, we introduce a novel GAN conditioning scheme based
on Action Units (AU) annotations, which describes in a continuous manifold the
anatomical facial movements defining a human expression. Our approach allows
controlling the magnitude of activation of each AU and combine several of them.
Additionally, we propose a fully unsupervised strategy to train the model, that
only requires images annotated with their activated AUs, and exploit attention
mechanisms that make our network robust to changing backgrounds and lighting
conditions. Extensive evaluation show that our approach goes beyond competing
conditional generators both in the capability to synthesize a much wider range
of expressions ruled by anatomically feasible muscle movements, as in the
capacity of dealing with images in the wild.
Nutzer