Abstract
We propose the idea of transferring common-sense knowledge from source
categories to target categories for scalable object detection. In our setting,
the training data for the source categories have bounding box annotations,
while those for the target categories only have image-level annotations.
Current state-of-the-art approaches focus on image-level visual or semantic
similarity to adapt a detector trained on the source categories to the new
target categories. In contrast, our key idea is to (i) use similarity not at
image-level, but rather at region-level, as well as (ii) leverage richer
common-sense (based on attribute, spatial, etc.,) to guide the algorithm
towards learning the correct detections. We acquire such common-sense cues
automatically from readily-available knowledge bases without any extra human
effort. On the challenging MS COCO dataset, we find that using common-sense
knowledge substantially improves detection performance over existing
transfer-learning baselines.
Users
Please
log in to take part in the discussion (add own reviews or comments).