Abstract
Pixel-level annotation demands expensive human efforts and limits the
performance of deep networks that usually benefits from more such training
data. In this work we aim to achieve high quality instance and semantic
segmentation results over a small set of pixel-level mask annotations and a
large set of box annotations. The basic idea is exploring detection models to
simplify the pixel-level supervised learning task and thus reduce the required
amount of mask annotations. Our architecture, named DASNet, consists of three
modules: detection, attention, and segmentation. The detection module detects
all classes of objects, the attention module generates multi-scale
class-specific features, and the segmentation module recovers the binary masks.
Our method demonstrates substantially improved performance compared to existing
semi-supervised approaches on PASCAL VOC 2012 dataset.
Users
Please
log in to take part in the discussion (add own reviews or comments).