Global-to-Contextual Shared Semantic Learning for Fine-Grained Vision-Language Alignment

被引:1
|
作者
Zheng, Min [1 ]
Wu, Chunpeng [1 ]
Qin, Jiaqi [1 ]
Liu, Weiwei [1 ]
Chen, Ming [2 ]
Lin, Long [1 ]
Zhou, Fei [1 ]
机构
[1] State Grid Smart Grid Res Inst Co Ltd, State Grid Lab Grid Adv Comp & Applicat, Beijing 102209, Peoples R China
[2] Xiamen Power Supply Co, State Grid Fujian Elect Power Co, Xiamen 361004, Peoples R China
关键词
Fine-grained vision-language alignment; Shared semantic learning; Global-to-contextual feature representation;
D O I
10.1007/978-3-031-44198-1_24
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The primary requisites of fine-grained vision-language alignment focus on learning effective features to discriminate fine-grained sub-categories and aligning heterogeneous data. This paper proposes a global-to-contextual shared semantic learning for fine-grained vision-language alignment method to address the above challenges. Precisely, to enhance the discrimination of features inside intra-modality, this method extracts the global and contextual vision and language features and carries out features joint learning. Further, this method constructs a shared semantic space, which bridges the semantic correlation of heterogeneous data. Extensive experiments demonstrate the effectiveness of our approach.
引用
收藏
页码:281 / 293
页数:13
相关论文
共 50 条
  • [11] Open-set Fine-grained Retrieval via Prompting Vision-Language Evaluator
    Wang, Shijie
    Chang, Jianlong
    Li, Haojie
    Wang, Zhihui
    Ouyang, Wanli
    Tian, Qi
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 19381 - 19391
  • [12] UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding
    Sun, Rui
    Wang, Zhecan
    You, Haoxuan
    Codella, Noel
    Chang, Kai-Wei
    Chang, Shih-Fu
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 778 - 793
  • [13] Efficient Prompt Tuning of Large Vision-Language Model for Fine-Grained Ship Classification
    Lan, Long
    Wang, Fengxiang
    Zheng, Xiangtao
    Wang, Zengmao
    Liu, Xinwang
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2025, 63
  • [14] Fine-grained Semantic Alignment Network forWeakly Supervised Temporal Language Grounding
    Wang, Yuechen
    Zhou, Wengang
    Li, Houqiang
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 89 - 99
  • [15] FashionSAP: Symbols and Attributes Prompt for Fine-grained Fashion Vision-Language Pre-training
    Han, Yunpeng
    Zhang, Lisai
    Chen, Qingcai
    Chen, Zhijian
    Li, Zhonghua
    Yang, Jianxin
    Cao, Zhao
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 15028 - 15038
  • [16] Unsupervised Visual-Textual Correlation Learning With Fine-Grained Semantic Alignment
    Peng, Yuxin
    Ye, Zhaoda
    Qi, Jinwei
    Zhuo, Yunkan
    IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (05) : 3669 - 3683
  • [17] Domain Adaptative Semantic Segmentation by Fine-Grained Alignment
    Li, Zhixin
    Li, Wei
    Zhang, Jia
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT IV, 2022, 13532 : 383 - 394
  • [18] PeVL: Pose-Enhanced Vision-Language Model for Fine-Grained Human Action Recognition
    Zhang, Haosong
    Leong, Mei Chee
    Li, Liyuan
    Lin, Weisi
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 18857 - 18867
  • [19] CoPL: Contextual Prompt Learning for Vision-Language Understanding
    Goswami, Koustava
    Karanam, Srikrishna
    Udhayanan, Prateksha
    Joseph, K. J.
    Srinivasan, Balaji Vasan
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 18090 - 18098
  • [20] Food-500 Cap: A Fine-Grained Food Caption Benchmark for Evaluating Vision-Language Models
    Ma, Zheng
    Pan, Mianzhi
    Wu, Wenhan
    Cheng, Kanzhi
    Zhang, Jianbing
    Huang, Shujian
    Chen, Jiajun
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5674 - 5685