Abstract
Important gains have recently been obtained in object detection by using
training objectives that focus on hard negative examples, i.e., negative
examples that are currently rated as positive or ambiguous by the detector.
These examples can strongly influence parameters when the network is trained to
correct them. Unfortunately, they are often sparse in the training data, and
are expensive to obtain. In this work, we show how large numbers of hard
negatives can be obtained automatically by analyzing the output of a
trained detector on video sequences. In particular, detections that are \em
isolated in time, i.e., that have no associated preceding or following
detections, are likely to be hard negatives. We describe simple procedures for
mining large numbers of such hard negatives (and also hard positives)
from unlabeled video data. Our experiments show that retraining detectors on
these automatically obtained examples often significantly improves performance.
We present experiments on multiple architectures and multiple data sets,
including face detection, pedestrian detection and other object categories.
Users
Please
log in to take part in the discussion (add own reviews or comments).