Author of the publication

Are You Looking? Grounding to Multiple Modalities in Vision-and-Language Navigation.

, , , , , and . ACL (1), page 6551-6557. Association for Computational Linguistics, (2019)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

FLAVA: A Foundational Language And Vision Alignment Model., , , , , , and . CoRR, (2021)Worldsheet: Wrapping the World in a 3D Sheet for View Synthesis from a Single Image., and . CoRR, (2020)Modeling Relationships in Referential Expressions with Compositional Modular Networks., , , , and . CVPR, page 4418-4427. IEEE Computer Society, (2017)Scaling Language-Image Pre-Training via Masking., , , , and . CVPR, page 23390-23400. IEEE, (2023)FLAVA: A Foundational Language And Vision Alignment Model., , , , , , and . CVPR, page 15617-15629. IEEE, (2022)Iterative Answer Prediction with Pointer-Augmented Multimodal Transformers for TextVQA., , , and . CoRR, (2019)Explainable Neural Computation via Stack Neural Module Networks., , , and . ECCV (7), volume 11211 of Lecture Notes in Computer Science, page 55-71. Springer, (2018)Transformer is All You Need: Multimodal Multitask Learning with a Unified Transformer., and . CoRR, (2021)Exploring Long-Sequence Masked Autoencoders., , , and . CoRR, (2022)LSDA: Large Scale Detection through Adaptation., , , , , , , and . NIPS, page 3536-3544. (2014)