Global-to-Contextual Shared Semantic Learning for Fine-Grained Vision-Language Alignment

被引:1
|
作者
Zheng, Min [1 ]
Wu, Chunpeng [1 ]
Qin, Jiaqi [1 ]
Liu, Weiwei [1 ]
Chen, Ming [2 ]
Lin, Long [1 ]
Zhou, Fei [1 ]
机构
[1] State Grid Smart Grid Res Inst Co Ltd, State Grid Lab Grid Adv Comp & Applicat, Beijing 102209, Peoples R China
[2] Xiamen Power Supply Co, State Grid Fujian Elect Power Co, Xiamen 361004, Peoples R China
关键词
Fine-grained vision-language alignment; Shared semantic learning; Global-to-contextual feature representation;
D O I
10.1007/978-3-031-44198-1_24
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The primary requisites of fine-grained vision-language alignment focus on learning effective features to discriminate fine-grained sub-categories and aligning heterogeneous data. This paper proposes a global-to-contextual shared semantic learning for fine-grained vision-language alignment method to address the above challenges. Precisely, to enhance the discrimination of features inside intra-modality, this method extracts the global and contextual vision and language features and carries out features joint learning. Further, this method constructs a shared semantic space, which bridges the semantic correlation of heterogeneous data. Extensive experiments demonstrate the effectiveness of our approach.
引用
收藏
页码:281 / 293
页数:13
相关论文
共 50 条
  • [1] Facial Expression Monitoring via Fine-Grained Vision-Language Alignment
    Ren, Weihong
    Gao, Yu
    Chen, Xiai
    Han, Zhi
    Wang, Zhiyong
    Wang, Jiaole
    Liu, Honghai
    IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2024,
  • [2] Efficiency-Aware Fine-Grained Vision-Language Retrieval via a Global-Contextual Autoencoder
    Zheng, Min
    Wu, Chunpeng
    Wang, Yue
    Liu, Weiwei
    Ye, Qinghe
    Chang, Ke
    Shi, Cuncun
    Zhou, Fei
    PATTERN RECOGNITION AND COMPUTER VISION, PT V, PRCV 2024, 2025, 15035 : 410 - 423
  • [3] Semantic Shield: Defending Vision-Language Models Against Backdooring and Poisoning via Fine-grained Knowledge Alignment
    Ishmam, Alvi Md
    Thomas, Christopher
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 24820 - 24830
  • [4] Fine-Grained Visual Prompt Learning of Vision-Language Models for Image Recognition
    Sun, Hongbo
    He, Xiangteng
    Zhou, Jiahuan
    Peng, Yuxin
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5828 - 5836
  • [5] Fine-grained multi-modal prompt learning for vision-language models
    Liu, Yunfei
    Deng, Yunziwei
    Liu, Anqi
    Liu, Yanan
    Li, Shengyang
    NEUROCOMPUTING, 2025, 636
  • [6] MAMO: Fine-Grained Vision-Language Representations Learning with Masked Multimodal Modeling
    Zhao, Zijia
    Guo, Longteng
    He, Xingjian
    Shao, Shuai
    Yuan, Zehuan
    Liu, Jing
    PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 1528 - 1538
  • [7] PROMETHEUS- VISION: Vision-Language Model as a Judge for Fine-Grained Evaluation
    Lee, Seongyun
    Kim, Seungone
    Park, Sue Hyun
    Kim, Geewook
    Seo, Minjoon
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 11286 - 11315
  • [8] Fine-Grained Semantically Aligned Vision-Language Pre-Training
    Li, Juncheng
    He, Xin
    Wei, Longhui
    Qian, Long
    Zhu, Linchao
    Xie, Lingxi
    Zhuang, Yueting
    Tian, Qi
    Tang, Siliang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [9] ViLLA: Fine-Grained Vision-Language Representation Learning from Real-World Data
    Varma, Maya
    Delbrouck, Jean-Benoit
    Hooper, Sarah
    Chaudhari, Akshay
    Langlotz, Curtis
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 22168 - 22178
  • [10] Auxiliary Fine-grained Alignment Constraints for Vision-and-Language Navigation
    Cui, Yibo
    Huang, Ruqiang
    Zhang, Yakun
    Cen, Yingjie
    Xie, Liang
    Yan, Ye
    Yin, Erwei
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 2621 - 2626