Global-to-Contextual Shared Semantic Learning for Fine-Grained Vision-Language Alignment

被引:1
|
作者
Zheng, Min [1 ]
Wu, Chunpeng [1 ]
Qin, Jiaqi [1 ]
Liu, Weiwei [1 ]
Chen, Ming [2 ]
Lin, Long [1 ]
Zhou, Fei [1 ]
机构
[1] State Grid Smart Grid Res Inst Co Ltd, State Grid Lab Grid Adv Comp & Applicat, Beijing 102209, Peoples R China
[2] Xiamen Power Supply Co, State Grid Fujian Elect Power Co, Xiamen 361004, Peoples R China
关键词
Fine-grained vision-language alignment; Shared semantic learning; Global-to-contextual feature representation;
D O I
10.1007/978-3-031-44198-1_24
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The primary requisites of fine-grained vision-language alignment focus on learning effective features to discriminate fine-grained sub-categories and aligning heterogeneous data. This paper proposes a global-to-contextual shared semantic learning for fine-grained vision-language alignment method to address the above challenges. Precisely, to enhance the discrimination of features inside intra-modality, this method extracts the global and contextual vision and language features and carries out features joint learning. Further, this method constructs a shared semantic space, which bridges the semantic correlation of heterogeneous data. Extensive experiments demonstrate the effectiveness of our approach.
引用
收藏
页码:281 / 293
页数:13
相关论文
共 50 条
  • [31] Fine-Grained Spatiotemporal Motion Alignment for Contrastive Video Representation Learning
    Zhu, Minghao
    Lin, Xiao
    Dang, Ronghao
    Liu, Chengju
    Chen, Qijun
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4725 - 4736
  • [32] Fine-Grained Representation Learning and Recognition by Exploiting Hierarchical Semantic Embedding
    Chen, Tianshui
    Wu, Wenxi
    Gao, Yuefang
    Dong, Le
    Luo, Xiaonan
    Lin, Liang
    PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 2023 - 2031
  • [33] Dual adaptive local semantic alignment for few-shot fine-grained classification
    Song, Wei
    Yang, Kaili
    VISUAL COMPUTER, 2025, 41 (04): : 2923 - 2937
  • [34] Integration of Global and Local Representations for Fine-Grained Cross-Modal Alignment
    Jin, Seungwan
    Choi, Hoyoung
    Noh, Taehyung
    Han, Kyungsik
    COMPUTER VISION - ECCV 2024, PT LXXXIII, 2025, 15141 : 53 - 70
  • [35] Few-shot Visual Learning with Contextual Memory and Fine-grained Calibration
    Ma, Yuqing
    Liu, Wei
    Bai, Shihao
    Zhang, Qingyu
    Liu, Aishan
    Chen, Weimin
    Liu, Xianglong
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 811 - 817
  • [36] CPLIP: Zero-Shot Learning for Histopathology with Comprehensive Vision-Language Alignment
    Javed, Sajid
    Mahmood, Arif
    Ganapathil, Iyyakutti Iyappan
    Dharej, Fayaz Ali
    Werghil, Naoufel
    Bennamoun, Mohammed
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 11450 - 11459
  • [37] Learning Efficient Vision Transformers via Fine-Grained Manifold Distillation
    Hao, Zhiwei
    Guo, Jianyuan
    Jia, Ding
    Han, Kai
    Tang, Yehui
    Zhang, Chao
    Hu, Han
    Wang, Yunhe
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [38] Learning Fine-grained Semantics in Spoken Language Using Visual Grounding
    Wang, Xinsheng
    Tian, Tian
    Zhu, Jihua
    Scharenborg, Odette
    2021 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2021,
  • [39] MVPTR: Multi-Level Semantic Alignment for Vision-Language Pre-Training via Multi-Stage Learning
    Li, Zejun
    Fan, Zhihao
    Tou, Huaixiao
    Chen, Jingjing
    Wei, Zhongyu
    Huang, Xuanjing
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4395 - 4405
  • [40] Fine-grained Co-Attentive Representation Learning for Semantic Code Search
    Deng, Zhongyang
    Xu, Ling
    Liu, Chao
    Yan, Meng
    Xu, Zhou
    Lei, Yan
    2022 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING (SANER 2022), 2022, : 396 - 407