共 50 条
- [1] Deep Video Understanding with Video-Language Model PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 9551 - 9555
- [3] LAVENDER: Unifying Video-Language Understanding as Masked Language Modeling 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 23119 - 23129
- [4] Verbs in Action: Improving verb understanding in video-language models 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15533 - 15545
- [5] Egocentric Video-Language Pretraining ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
- [6] DeVAn: Dense Video Annotation for Video-Language Models PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 14305 - 14321
- [7] VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 4227 - 4239
- [8] ViLA: Efficient Video-Language Alignment for Video Question Answering COMPUTER VISION - ECCV 2024, PT LXII, 2025, 15120 : 186 - 204
- [9] VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of Video-Language Models COMPUTER VISION - ECCV 2024, PT LXX, 2025, 15128 : 331 - 348
- [10] VidLA: Video-Language Alignment at Scale 2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 14043 - 14055