Unsupervised Textual Grounding: Linking Words to Image Concepts

Abstract

Textual grounding, i.e., linking words to objects in images, is a challenging but important task for robotics and human-computer interaction. Existing techniques benefit from recent progress in deep learning and generally formulate the task as a supervised learning problem, selecting a bounding box from a set of possible options. To train these deep net based approaches, access to a large-scale datasets is required, however, constructing such a dataset is time-consuming and expensive. Therefore, we develop a completely unsupervised mechanism for textual grounding using hypothesis testing as a mechanism to link words to detected image concepts. We demonstrate our approach on the ReferIt Game dataset and the Flickr30k data, outperforming baselines by 7.98\% and 6.96\% respectively.

BibTeX key: citeulike:14581108
entry type: misc
year: 2018
month: mar
day: 29
citeulike-article-id: 14581108
citeulike-linkout-1: http://arxiv.org/pdf/1803.11185
priority: 0
posted-at: 2018-05-05 16:16:48
eprint: 1803.11185
citeulike-linkout-0: http://arxiv.org/abs/1803.11185
archiveprefix: arXiv
url: http://arxiv.org/abs/1803.11185

BibSonomy

Unsupervised Textual Grounding: Linking Words to Image Concepts

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on