Author of the publication

Investigating Bidimensional Downsampling in Vision Transformer Models.

, , , , , and . ICIAP (2), volume 13232 of Lecture Notes in Computer Science, page 287-299. Springer, (2022)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

Multi-Class Explainable Unlearning for Image Classification via Weight Filtering., , , , and . CoRR, (2023)Fashion-Oriented Image Captioning with External Knowledge Retrieval and Fully Attentive Gates., , , , , and . Sensors, 23 (3): 1286 (February 2023)Predicting Human Eye Fixations via an LSTM-Based Saliency Attentive Model., , , and . IEEE Trans. Image Process., 27 (10): 5142-5154 (2018)Transform, Warp, and Dress: A New Transformation-guided Model for Virtual Try-on., , , , and . ACM Trans. Multim. Comput. Commun. Appl., 18 (2): 62:1-62:24 (2022)ALADIN: Distilling Fine-grained Alignment Scores for Efficient Image-Text Matching and Retrieval., , , , , , and . CBMI, page 64-70. ACM, (2022)OpenFashionCLIP: Vision-and-Language Contrastive Learning with Open-Source Fashion Data., , , , , and . ICIAP (1), volume 14233 of Lecture Notes in Computer Science, page 245-256. Springer, (2023)A unified cycle-consistent neural model for text and image retrieval., , , and . Multim. Tools Appl., 79 (35-36): 25697-25721 (2020)The (R)Evolution of Multimodal Large Language Models: A Survey., , , , , , , , and . CoRR, (2024)Learning to Mask and Permute Visual Tokens for Vision Transformer Pre-Training., , , , , and . CoRR, (2023)Visual saliency for image captioning in new multimedia services., , , and . ICME Workshops, page 309-314. IEEE Computer Society, (2017)