Global-to-Contextual Shared Semantic Learning for Fine-Grained Vision-Language Alignment

被引:1
|
作者
Zheng, Min [1 ]
Wu, Chunpeng [1 ]
Qin, Jiaqi [1 ]
Liu, Weiwei [1 ]
Chen, Ming [2 ]
Lin, Long [1 ]
Zhou, Fei [1 ]
机构
[1] State Grid Smart Grid Res Inst Co Ltd, State Grid Lab Grid Adv Comp & Applicat, Beijing 102209, Peoples R China
[2] Xiamen Power Supply Co, State Grid Fujian Elect Power Co, Xiamen 361004, Peoples R China
关键词
Fine-grained vision-language alignment; Shared semantic learning; Global-to-contextual feature representation;
D O I
10.1007/978-3-031-44198-1_24
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The primary requisites of fine-grained vision-language alignment focus on learning effective features to discriminate fine-grained sub-categories and aligning heterogeneous data. This paper proposes a global-to-contextual shared semantic learning for fine-grained vision-language alignment method to address the above challenges. Precisely, to enhance the discrimination of features inside intra-modality, this method extracts the global and contextual vision and language features and carries out features joint learning. Further, this method constructs a shared semantic space, which bridges the semantic correlation of heterogeneous data. Extensive experiments demonstrate the effectiveness of our approach.
引用
收藏
页码:281 / 293
页数:13
相关论文
共 50 条
  • [21] Federated fine-grained prompts for vision-language models based on open-vocabulary object detection
    Li, Yu
    APPLIED INTELLIGENCE, 2025, 55 (07)
  • [22] Tacoma: Enhanced Browser Fuzzing with Fine-Grained Semantic Alignment
    Wang, Jiashui
    Qian, Peng
    Huang, Xilin
    Ying, Xinlei
    Chen, Yan
    Ji, Shouling
    Chen, Jianhai
    Xie, Jundong
    Liu, Long
    PROCEEDINGS OF THE 33RD ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, ISSTA 2024, 2024, : 1174 - 1185
  • [23] Landmark-RxR: Solving Vision-and-Language Navigation with Fine-Grained Alignment Supervision
    He, Keji
    Huang, Yan
    Wu, Qi
    Yang, Jianhua
    An, Dong
    Sima, Shuanglin
    Wang, Liang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [24] A fine-grained vision and language representation framework with graph-based fashion semantic knowledge
    Ding, Huiming
    Wang, Sen
    Xie, Zhifeng
    Li, Mengtian
    Ma, Lizhuang
    COMPUTERS & GRAPHICS-UK, 2023, 115 : 216 - 225
  • [25] Semantic interaction learning for fine-grained vehicle recognition
    Zhang, Jingjing
    Lei, Jingsheng
    Yang, Shengying
    Yang, Xinqi
    COMPUTER ANIMATION AND VIRTUAL WORLDS, 2022, 33 (01)
  • [26] Fine-grained Image Classification via Combining Vision and Language
    He, Xiangteng
    Peng, Yuxin
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 7332 - 7340
  • [27] Measuring Progress in Fine-grained Vision-and-Language Understanding
    Bugliarello, Emanuele
    Sartran, Laurent
    Agrawal, Aishwarya
    Hendricks, Lisa Anne
    Nematzadeh, Aida
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 1559 - 1582
  • [28] Improved Semantic-Aware Network Embedding with Fine-Grained Word Alignment
    Shen, Dinghan
    Zhang, Xinyuan
    Henao, Ricardo
    Carin, Lawrence
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 1829 - 1838
  • [29] Semantic-Guided Information Alignment Network for Fine-Grained Image Recognition
    Wang, Shijie
    Wang, Zhihui
    Li, Haojie
    Chang, Jianlong
    Ouyang, Wanli
    Tian, Qi
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (11) : 6558 - 6570
  • [30] GalLoP: Learning Global and Local Prompts for Vision-Language Models
    Lafon, Marc
    Ramzi, Elias
    Rambour, Clement
    Audebert, Nicolas
    Thome, Nicolas
    COMPUTER VISION - ECCV 2024, PT LXI, 2025, 15119 : 264 - 282