copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Multimodal Learning with Transformers: A Survey

P. Xu, X. Zhu, and D. Clifton. (May 2023)arXiv:2206.06488 cs.

Abstract

Transformer is a promising neural network learner, and has achieved great success in various machine learning tasks. Thanks to the recent prevalence of multimodal applications and big data, Transformer-based multimodal learning has become a hot topic in AI research. This paper presents a comprehensive survey of Transformer techniques oriented at multimodal data. The main contents of this survey include: (1) a background of multimodal learning, Transformer ecosystem, and the multimodal big data era, (2) a theoretical review of Vanilla Transformer, Vision Transformer, and multimodal Transformers, from a geometrically topological perspective, (3) a review of multimodal Transformer applications, via two important paradigms, i.e., for multimodal pretraining and for specific multimodal tasks, (4) a summary of the common challenges and designs shared by the multimodal Transformer models and applications, and (5) a discussion of open problems and potential research directions for the community.

Links and resources

BibTeX key: xu_multimodal_2023
entry type: misc
year: 2023
month: may
publisher: arXiv
annote: Comment: This paper is accepted by IEEE TPAMI
shorttitle: Multimodal Learning with Transformers
file: arXiv.org Snapshot:/Users/pascal/Zotero/storage/HGZN5MS5/2206.html:text/html;Full Text PDF:/Users/pascal/Zotero/storage/SEBZCPJP/Xu et al. - 2023 - Multimodal Learning with Transformers A Survey.pdf:application/pdf
urldate: 2023-07-10
url: http://arxiv.org/abs/2206.06488
note: arXiv:2206.06488 cs

@jascal_panetzky's tags highlighted

Cite this publication

@misc{xu_multimodal_2023, abstract = {Transformer is a promising neural network learner, and has achieved great success in various machine learning tasks. Thanks to the recent prevalence of multimodal applications and big data, Transformer-based multimodal learning has become a hot topic in AI research. This paper presents a comprehensive survey of Transformer techniques oriented at multimodal data. The main contents of this survey include: (1) a background of multimodal learning, Transformer ecosystem, and the multimodal big data era, (2) a theoretical review of Vanilla Transformer, Vision Transformer, and multimodal Transformers, from a geometrically topological perspective, (3) a review of multimodal Transformer applications, via two important paradigms, i.e., for multimodal pretraining and for specific multimodal tasks, (4) a summary of the common challenges and designs shared by the multimodal Transformer models and applications, and (5) a discussion of open problems and potential research directions for the community.}, added-at = {2023-07-31T08:05:54.000+0200}, annote = {Comment: This paper is accepted by IEEE TPAMI}, author = {Xu, Peng and Zhu, Xiatian and Clifton, David A.}, biburl = {https://www.bibsonomy.org/bibtex/2f5d56723678ce0ef73eda22dd8c2d35b/jascal_panetzky}, file = {arXiv.org Snapshot:/Users/pascal/Zotero/storage/HGZN5MS5/2206.html:text/html;Full Text PDF:/Users/pascal/Zotero/storage/SEBZCPJP/Xu et al. - 2023 - Multimodal Learning with Transformers A Survey.pdf:application/pdf}, interhash = {ca9c92a3e88db61bb8f9f2ad57f13f96}, intrahash = {f5d56723678ce0ef73eda22dd8c2d35b}, keywords = {- Computer Learning Machine Pattern Recognition, Science Vision and ecomodelling}, month = may, note = {arXiv:2206.06488 [cs]}, publisher = {arXiv}, shorttitle = {Multimodal {Learning} with {Transformers}}, timestamp = {2023-07-31T08:07:14.000+0200}, title = {Multimodal {Learning} with {Transformers}: {A} {Survey}}, url = {http://arxiv.org/abs/2206.06488}, urldate = {2023-07-10}, year = 2023 }

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Multimodal Learning with Transformers: A Survey

Abstract

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Multimodal Learning with Transformers: A Survey

Abstract

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Multimodal Learning with Transformers: A Survey

Comments and Reviews
(0)