From post

Scene-robust Natural Language Video Localization via Learning Domain-invariant Representations.

, , , , и . ACL (Findings), стр. 144-160. Association for Computational Linguistics, (2023)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed.

 

Другие публикации лиц с тем же именем

Saliency based proposal refinement in robotic vision., , и . RCAR, стр. 85-90. IEEE, (2017)Video Question Answering via Knowledge-based Progressive Spatial-Temporal Attention Network., , , , , и . TOMM, 15 (2s): 52:1-52:22 (2019)Temporal Interaction and Causal Influence in Community-Based Question Answering., , , , , , и . IEEE Trans. Knowl. Data Eng., 29 (10): 2304-2317 (2017)Learning Max-Margin GeoSocial Multimedia Network Representations for Point-of-Interest Suggestion., , , , , , и . SIGIR, стр. 833-836. ACM, (2017)Video Dialog via Multi-Grained Convolutional Self-Attention Context Networks., , , , , и . SIGIR, стр. 465-474. ACM, (2019)Efficient location-based search of trajectories with location importance., , , и . Knowl. Inf. Syst., 45 (1): 215-245 (2015)TaoHighlight: Commodity-Aware Multi-Modal Video Highlight Detection in E-Commerce., , , , , и . IEEE Trans. Multim., (2022)Generation Method for Shaded Relief Based on Conditional Generative Adversarial Nets., , , , и . ISPRS Int. J. Geo Inf., 11 (7): 374 (2022)AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head., , , , , , , , , и 3 other автор(ы). CoRR, (2023)Frame-Subtitle Self-Supervision for Multi-Modal Video Question Answering., , и . CoRR, (2022)