Abstract
Convolutional neural networks (CNNs) are one of the driving forces for the
advancement of computer vision. Despite their promising performances on many
tasks, CNNs still face major obstacles on the road to achieving ideal machine
intelligence. One is that CNNs are complex and hard to interpret. Another is
that standard CNNs require large amounts of annotated data, which is sometimes
hard to obtain, and it is desirable to learn to recognize objects from few
examples. In this work, we address these limitations of CNNs by developing
novel, flexible, and interpretable models for few-shot learning. Our models are
based on the idea of encoding objects in terms of visual concepts (VCs), which
are interpretable visual cues represented by the feature vectors within CNNs.
We first adapt the learning of VCs to the few-shot setting, and then uncover
two key properties of feature encoding using VCs, which we call category
sensitivity and spatial pattern. Motivated by these properties, we present two
intuitive models for the problem of few-shot learning. Experiments show that
our models achieve competitive performances, while being more flexible and
interpretable than alternative state-of-the-art few-shot learning methods. We
conclude that using VCs helps expose the natural capability of CNNs for
few-shot learning.
Users
Please
log in to take part in the discussion (add own reviews or comments).