Abstract
In this work, we revisit atrous convolution, a powerful tool to explicitly
adjust filter's field-of-view as well as control the resolution of feature
responses computed by Deep Convolutional Neural Networks, in the application of
semantic image segmentation. To handle the problem of segmenting objects at
multiple scales, we design modules which employ atrous convolution in cascade
or in parallel to capture multi-scale context by adopting multiple atrous
rates. Furthermore, we propose to augment our previously proposed Atrous
Spatial Pyramid Pooling module, which probes convolutional features at multiple
scales, with image-level features encoding global context and further boost
performance. We also elaborate on implementation details and share our
experience on training our system. The proposed `DeepLabv3' system
significantly improves over our previous DeepLab versions without DenseCRF
post-processing and attains comparable performance with other state-of-art
models on the PASCAL VOC 2012 semantic image segmentation benchmark.
Users
Please
log in to take part in the discussion (add own reviews or comments).