Author of the publication

UniT3D: A Unified Transformer for 3D Dense Captioning and Visual Grounding.

, , , , and . ICCV, page 18063-18073. IEEE, (2023)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

FLAVA: A Foundational Language And Vision Alignment Model., , , , , , and . CoRR, (2021)Worldsheet: Wrapping the World in a 3D Sheet for View Synthesis from a Single Image., and . CoRR, (2020)Scaling Language-Image Pre-Training via Masking., , , , and . CVPR, page 23390-23400. IEEE, (2023)FLAVA: A Foundational Language And Vision Alignment Model., , , , , , and . CVPR, page 15617-15629. IEEE, (2022)Modeling Relationships in Referential Expressions with Compositional Modular Networks., , , , and . CVPR, page 4418-4427. IEEE Computer Society, (2017)Natural Language Object Retrieval., , , , , and . CoRR, (2015)TextCaps: A Dataset for Image Captioning with Reading Comprehension., , , and . ECCV (2), volume 12347 of Lecture Notes in Computer Science, page 742-758. Springer, (2020)UniT3D: A Unified Transformer for 3D Dense Captioning and Visual Grounding., , , , and . ICCV, page 18063-18073. IEEE, (2023)Worldsheet: Wrapping the World in a 3D Sheet for View Synthesis from a Single Image., , , and . ICCV, page 12508-12517. IEEE, (2021)Are You Looking? Grounding to Multiple Modalities in Vision-and-Language Navigation., , , , , and . ACL (1), page 6551-6557. Association for Computational Linguistics, (2019)