Abstract
In this paper, we first investigate why typical two-stage methods are not as
fast as single-stage, fast detectors like YOLO and SSD. We find that Faster
R-CNN and R-FCN perform an intensive computation after or before RoI warping.
Faster R-CNN involves two fully connected layers for RoI recognition, while
R-FCN produces a large score maps. Thus, the speed of these networks is slow
due to the heavy-head design in the architecture. Even if we significantly
reduce the base model, the computation cost cannot be largely decreased
accordingly.
We propose a new two-stage detector, Light-Head R-CNN, to address the
shortcoming in current two-stage approaches. In our design, we make the head of
network as light as possible, by using a thin feature map and a cheap R-CNN
subnet (pooling and single fully-connected layer). Our ResNet-101 based
light-head R-CNN outperforms state-of-art object detectors on COCO while
keeping time efficiency. More importantly, simply replacing the backbone with a
tiny network (e.g, Xception), our Light-Head R-CNN gets 30.7 mmAP at 102 FPS on
COCO, significantly outperforming the single-stage, fast detectors like YOLO
and SSD on both speed and accuracy. Code will be made publicly available.
Users
Please
log in to take part in the discussion (add own reviews or comments).