@dblp

A Challenging Multimodal Video Summary: Simultaneously Extracting and Generating Keyframe-Caption Pairs from Video.

, , , and . EMNLP, page 7380-7402. Association for Computational Linguistics, (2023)

Links and resources

Tags