MultiGrain is a network architecture producing compact vector representations
that are suited both for image classification and particular object retrieval.
It builds on a standard classification trunk. The top of the network produces
an embedding containing coarse and fine-grained information, so that images can
be recognized based on the object class, particular object, or if they are
distorted copies. Our joint training is simple: we minimize a cross-entropy
loss for classification and a ranking loss that determines if two images are
identical up to data augmentation, with no need for additional labels. A key
component of MultiGrain is a pooling layer that allow us to take advantage of
high-resolution images with a network trained at a lower resolution.
When fed to a linear classifier, the learned embeddings provide
state-of-the-art classification accuracy. For instance, we obtain 79.3% top-1
accuracy with a ResNet-50 learned on Imagenet, which is a +1.7% absolute
improvement over the AutoAugment method. When compared with the cosine
similarity, the same embeddings perform on par with the state-of-the-art for
image retrieval at moderate resolutions.
Beschreibung
MultiGrain: a unified image embedding for classes and instances
%0 Generic
%1 berman2019multigrain
%A Berman, Maxim
%A Jégou, Hervé
%A Vedaldi, Andrea
%A Kokkinos, Iasonas
%A Douze, Matthijs
%D 2019
%K augmentation classification pooling retrieval semisup
%T MultiGrain: a unified image embedding for classes and instances
%U http://arxiv.org/abs/1902.05509
%X MultiGrain is a network architecture producing compact vector representations
that are suited both for image classification and particular object retrieval.
It builds on a standard classification trunk. The top of the network produces
an embedding containing coarse and fine-grained information, so that images can
be recognized based on the object class, particular object, or if they are
distorted copies. Our joint training is simple: we minimize a cross-entropy
loss for classification and a ranking loss that determines if two images are
identical up to data augmentation, with no need for additional labels. A key
component of MultiGrain is a pooling layer that allow us to take advantage of
high-resolution images with a network trained at a lower resolution.
When fed to a linear classifier, the learned embeddings provide
state-of-the-art classification accuracy. For instance, we obtain 79.3% top-1
accuracy with a ResNet-50 learned on Imagenet, which is a +1.7% absolute
improvement over the AutoAugment method. When compared with the cosine
similarity, the same embeddings perform on par with the state-of-the-art for
image retrieval at moderate resolutions.
@misc{berman2019multigrain,
abstract = {MultiGrain is a network architecture producing compact vector representations
that are suited both for image classification and particular object retrieval.
It builds on a standard classification trunk. The top of the network produces
an embedding containing coarse and fine-grained information, so that images can
be recognized based on the object class, particular object, or if they are
distorted copies. Our joint training is simple: we minimize a cross-entropy
loss for classification and a ranking loss that determines if two images are
identical up to data augmentation, with no need for additional labels. A key
component of MultiGrain is a pooling layer that allow us to take advantage of
high-resolution images with a network trained at a lower resolution.
When fed to a linear classifier, the learned embeddings provide
state-of-the-art classification accuracy. For instance, we obtain 79.3% top-1
accuracy with a ResNet-50 learned on Imagenet, which is a +1.7% absolute
improvement over the AutoAugment method. When compared with the cosine
similarity, the same embeddings perform on par with the state-of-the-art for
image retrieval at moderate resolutions.},
added-at = {2019-03-15T23:12:11.000+0100},
author = {Berman, Maxim and Jégou, Hervé and Vedaldi, Andrea and Kokkinos, Iasonas and Douze, Matthijs},
biburl = {https://www.bibsonomy.org/bibtex/29fb647b4f511705544e684f2f55d01d6/nmatsuk},
description = {MultiGrain: a unified image embedding for classes and instances},
interhash = {75f93c73befe665aa8ab9b4488496bf9},
intrahash = {9fb647b4f511705544e684f2f55d01d6},
keywords = {augmentation classification pooling retrieval semisup},
note = {cite arxiv:1902.05509},
timestamp = {2019-03-15T23:12:11.000+0100},
title = {MultiGrain: a unified image embedding for classes and instances},
url = {http://arxiv.org/abs/1902.05509},
year = 2019
}