共 50 条
- [31] HiVLP: Hierarchical Interactive Video-Language Pre-Training 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 13710 - 13720
- [32] OmniVL: One Foundation Model for Image-Language and Video-Language Tasks ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
- [33] Object-aware Video-language Pre-training for Retrieval 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 3303 - 3312
- [34] Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
- [35] ε-ViLM : Efficient Video-Language Model via Masked Video Modeling with Semantic Vector-Quantized Tokenizer 2024 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WORKSHOPS, WACVW 2024, 2024, : 529 - 540
- [36] All in One: Exploring Unified Video-Language Pre-training 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 6598 - 6608
- [39] PiTe: Pixel-Temporal Alignment for Large Video-Language Model COMPUTER VISION - ECCV 2024, PT V, 2025, 15063 : 160 - 176
- [40] Multimodal Analysis for Deep Video Understanding with Video Language Transformer PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 7165 - 7169