Abstract
We propose a network for Congested Scene Recognition called CSRNet to provide
a data-driven and deep learning method that can understand highly congested
scenes and perform accurate count estimation as well as present high-quality
density maps. The proposed CSRNet is composed of two major components: a
convolutional neural network (CNN) as the front-end for 2D feature extraction
and a dilated CNN for the back-end, which uses dilated kernels to deliver
larger reception fields and to replace pooling operations. CSRNet is an
easy-trained model because of its pure convolutional structure. To our best
acknowledge, CSRNet is the first implementation using dilated CNNs for crowd
counting tasks. We demonstrate CSRNet on four datasets (ShanghaiTech dataset,
the UCF\_CC\_50 dataset, the WorldEXPO'10 dataset, and the UCSD dataset) and we
deliver the state-of-the-art performance. In the ShanghaiTech Part\_B dataset,
CSRNet significantly achieves 47.3\% lower MAE than the previous
state-of-the-art method. We extend the targeted applications for counting other
objects, such as the vehicle in TRANCOS dataset. Results show that CSRNet
significantly improves the output quality with 15.4\% lower MAE than the
previous state-of-the-art approach.
Users
Please
log in to take part in the discussion (add own reviews or comments).