Abstract
Weakly supervised learning with only coarse labels can obtain visual
explanations of deep neural network such as attention maps by back-propagating
gradients. These attention maps are then available as priors for tasks such as
object localization and semantic segmentation. In one common framework we
address three shortcomings of previous approaches in modeling such attention
maps: We (1) first time make attention maps an explicit and natural component
of the end-to-end training, (2) provide self-guidance directly on these maps by
exploring supervision form the network itself to improve them, and (3)
seamlessly bridge the gap between using weak and extra supervision if
available. Despite its simplicity, experiments on the semantic segmentation
task demonstrate the effectiveness of our methods. We clearly surpass the
state-of-the-art on Pascal VOC 2012 val. and test set. Besides, the proposed
framework provides a way not only explaining the focus of the learner but also
feeding back with direct guidance towards specific tasks. Under mild
assumptions our method can also be understood as a plug-in to existing weakly
supervised learners to improve their generalization performance.
Users
Please
log in to take part in the discussion (add own reviews or comments).