Abstract
Various methods of measuring unit selectivity have been developed with the
aim of better understanding how neural networks work. But the different
measures provide divergent estimates of selectivity, and this has led to
different conclusions regarding the conditions in which selective object
representations are learned and the functional relevance of these
representations. In an attempt to better characterize object selectivity, we
undertake a comparison of various selectivity measures on a large set of units
in AlexNet, including localist selectivity, precision, class-conditional mean
activity selectivity (CCMAS), network dissection,the human interpretation of
activation maximization (AM) images, and standard signal-detection measures. We
find that the different measures provide different estimates of object
selectivity, with precision and CCMAS measures providing misleadingly high
estimates. Indeed, the most selective units had a poor hit-rate or a high
false-alarm rate (or both) in object classification, making them poor object
detectors. We fail to find any units that are even remotely as selective as the
'grandmother cell' units reported in recurrent neural networks. In order to
generalize these results, we compared selectivity measures on units in VGG-16
and GoogLeNet trained on the ImageNet or Places-365 datasets that have been
described as 'object detectors'. Again, we find poor hit-rates and high
false-alarm rates for object classification. We conclude that signal-detection
measures provide a better assessment of single-unit selectivity compared to
common alternative approaches, and that deep convolutional networks of image
classification do not learn object detectors in their hidden layers.
Description
[2007.01062] Are there any 'object detectors' in the hidden layers of CNNs trained to identify objects or scenes?
Links and resources
Tags