Abstract
Big data has had a great share in the success of deep learning in computer
vision. Recent works suggest that there is significant further potential to
increase object detection performance by utilizing even bigger datasets. In
this paper, we introduce the EuroCity Persons dataset, which provides a large
number of highly diverse, accurate and detailed annotations of pedestrians,
cyclists and other riders in urban traffic scenes. The images for this dataset
were collected on-board a moving vehicle in 31 cities of 12 European countries.
With over 238200 person instances manually labeled in over 47300 images,
EuroCity Persons is nearly one order of magnitude larger than person datasets
used previously for benchmarking. The dataset furthermore contains a large
number of person orientation annotations (over 211200). We optimize four
state-of-the-art deep learning approaches (Faster R-CNN, R-FCN, SSD and YOLOv3)
to serve as baselines for the new object detection benchmark. In experiments
with previous datasets we analyze the generalization capabilities of these
detectors when trained with the new dataset. We furthermore study the effect of
the training set size, the dataset diversity (day- vs. night-time, geographical
region), the dataset detail (i.e. availability of object orientation
information) and the annotation quality on the detector performance. Finally,
we analyze error sources and discuss the road ahead.
Users
Please
log in to take part in the discussion (add own reviews or comments).