copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Estimating the Information Gap Between Textual and Visual Representations

C. Henning, and R. Ewerth. Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval, page 14--22. New York, NY, USA, ACM, (2017)
DOI: 10.1145/3078971.3078991

Abstract

Photos, drawings, figures, etc. supplement textual information in various kinds of media, for example, in web news or scientific publications. In this respect, the intended effect of an image can be quite different, e.g., providing additional information, focusing on certain details of surrounding text, or simply being a general illustration of a topic. As a consequence, the semantic correlation between information of different modalities can vary noticeably, too. Moreover, cross-modal interrelations are often hard to describe in a precise way. The variety of possible interrelations of textual and graphical information and the question, how they can be described and automatically estimated have not been addressed yet by previous work. In this paper, we present several contributions to close this gap. First, we introduce two measures to describe cross-modal interrelations: cross-modal mutual information (CMI) and semantic correlation (SC). Second, a novel approach relying on deep learning is suggested to estimate CMI and SC of textual and visual information. Third, three diverse datasets are leveraged to learn an appropriate deep neural network model for the demanding task. The system has been evaluated on a challenging test set and the experimental results demonstrate the feasibility of the approach.

Links and resources

BibTeX key: henning2017estimating
entry type: inproceedings
address: New York, NY, USA
booktitle: Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval
year: 2017
pages: 14--22
publisher: ACM
series: ICMR '17
acmid: 3078991
isbn: 978-1-4503-4701-3
location: Bucharest, Romania
numpages: 9
DOI: 10.1145/3078971.3078991
url: http://doi.acm.org/10.1145/3078971.3078991

@jaeschke's tags highlighted

Cite this publication

@inproceedings{henning2017estimating, abstract = {Photos, drawings, figures, etc. supplement textual information in various kinds of media, for example, in web news or scientific publications. In this respect, the intended effect of an image can be quite different, e.g., providing additional information, focusing on certain details of surrounding text, or simply being a general illustration of a topic. As a consequence, the semantic correlation between information of different modalities can vary noticeably, too. Moreover, cross-modal interrelations are often hard to describe in a precise way. The variety of possible interrelations of textual and graphical information and the question, how they can be described and automatically estimated have not been addressed yet by previous work. In this paper, we present several contributions to close this gap. First, we introduce two measures to describe cross-modal interrelations: cross-modal mutual information (CMI) and semantic correlation (SC). Second, a novel approach relying on deep learning is suggested to estimate CMI and SC of textual and visual information. Third, three diverse datasets are leveraged to learn an appropriate deep neural network model for the demanding task. The system has been evaluated on a challenging test set and the experimental results demonstrate the feasibility of the approach.}, acmid = {3078991}, added-at = {2017-06-28T10:15:22.000+0200}, address = {New York, NY, USA}, author = {Henning, Christian Andreas and Ewerth, Ralph}, biburl = {https://www.bibsonomy.org/bibtex/288a22338d5fdb50d963c62e44f041eb1/jaeschke}, booktitle = {Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval}, doi = {10.1145/3078971.3078991}, interhash = {e47db03662e562e73636d830f419a57a}, intrahash = {88a22338d5fdb50d963c62e44f041eb1}, isbn = {978-1-4503-4701-3}, keywords = {classification image learning machine text}, location = {Bucharest, Romania}, numpages = {9}, pages = {14--22}, publisher = {ACM}, series = {ICMR '17}, timestamp = {2017-06-28T10:15:22.000+0200}, title = {Estimating the Information Gap Between Textual and Visual Representations}, url = {http://doi.acm.org/10.1145/3078971.3078991}, year = 2017 }

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Estimating the Information Gap Between Textual and Visual Representations

Abstract

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Estimating the Information Gap Between Textual and Visual Representations

Abstract

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Estimating the Information Gap Between Textual and Visual Representations

Comments and Reviews
(0)