Abstract
Most convolutional neural networks use some method for gradually downscaling
the size of the hidden layers. This is commonly referred to as pooling, and is
applied to reduce the number of parameters, improve invariance to certain
distortions, and increase the receptive field size. Since pooling by nature is
a lossy process, it is crucial that each such layer maintains the portion of
the activations that is most important for the network's discriminability. Yet,
simple maximization or averaging over blocks, max or average pooling, or plain
downsampling in the form of strided convolutions are the standard. In this
paper, we aim to leverage recent results on image downscaling for the purposes
of deep learning. Inspired by the human visual system, which focuses on local
spatial changes, we propose detail-preserving pooling (DPP), an adaptive
pooling method that magnifies spatial changes and preserves important
structural detail. Importantly, its parameters can be learned jointly with the
rest of the network. We analyze some of its theoretical properties and show its
empirical benefits on several datasets and networks, where DPP consistently
outperforms previous pooling approaches.
Users
Please
log in to take part in the discussion (add own reviews or comments).