共 50 条
- [41] FLAVA: A Foundational Language And Vision Alignment Model 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 15617 - 15629
- [42] ChartFormer: A Large Vision Language Model for Converting Chart Images into Tactile Accessible SVGs COMPUTERS HELPING PEOPLE WITH SPECIAL NEEDS, PT I, ICCHP 2024, 2024, 14750 : 299 - 305
- [43] BRAVE: Broadening the Visual Encoding of Vision-Language Models COMPUTER VISION - ECCV 2024, PT XVI, 2025, 15074 : 113 - 132
- [44] Effect of Visual Extensions on Natural Language Understanding in Vision-and-Language Models 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 2189 - 2196
- [45] Interpreting vision and language generative models with semantic visual priors FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2023, 6
- [48] VinVL: Revisiting Visual Representations in Vision-Language Models 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 5575 - 5584
- [49] Augmenting Vision Language Pretraining by Learning Codebook with Visual Semantics 2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 4779 - 4785
- [50] ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling COMPUTER VISION - ECCV 2024, PT LXI, 2025, 15119 : 37 - 53