共 50 条
- [21] Test of Time: Instilling Video-Language Models with a Sense of Time 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2503 - 2516
- [22] Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 5026 - 5035
- [23] PAXION: Patching Action Knowledge in Video-Language Foundation Models ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
- [24] Learning Unified Video-Language Representations via Joint Modeling and Contrastive Learning for Natural Language Video Localization 2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
- [25] Depth-Aware Sparse Transformer for Video-Language Learning PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4778 - 4787
- [27] Clover : Towards A Unified Video-Language Alignment and Fusion Model 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 14856 - 14866
- [29] VideoCon: Robust Video-Language Alignment via Contrast Captions 2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 13927 - 13937
- [30] Learning Trajectory-Word Alignments for Video-Language Tasks 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 2504 - 2514