Global-to-Contextual Shared Semantic Learning for Fine-Grained Vision-Language Alignment

被引:1
|
作者
Zheng, Min [1 ]
Wu, Chunpeng [1 ]
Qin, Jiaqi [1 ]
Liu, Weiwei [1 ]
Chen, Ming [2 ]
Lin, Long [1 ]
Zhou, Fei [1 ]
机构
[1] State Grid Smart Grid Res Inst Co Ltd, State Grid Lab Grid Adv Comp & Applicat, Beijing 102209, Peoples R China
[2] Xiamen Power Supply Co, State Grid Fujian Elect Power Co, Xiamen 361004, Peoples R China
关键词
Fine-grained vision-language alignment; Shared semantic learning; Global-to-contextual feature representation;
D O I
10.1007/978-3-031-44198-1_24
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The primary requisites of fine-grained vision-language alignment focus on learning effective features to discriminate fine-grained sub-categories and aligning heterogeneous data. This paper proposes a global-to-contextual shared semantic learning for fine-grained vision-language alignment method to address the above challenges. Precisely, to enhance the discrimination of features inside intra-modality, this method extracts the global and contextual vision and language features and carries out features joint learning. Further, this method constructs a shared semantic space, which bridges the semantic correlation of heterogeneous data. Extensive experiments demonstrate the effectiveness of our approach.
引用
收藏
页码:281 / 293
页数:13
相关论文
共 50 条
  • [41] Category-specific Semantic Coherency Learning for Fine-grained Image Recognition
    Wang, Shijie
    Wang, Zhihui
    Li, Haojie
    Ouyang, Wanli
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 174 - 183
  • [42] Fine-grained semantic oriented embedding set alignment for text-based person search
    Zhao, Jiaqi
    Fu, Ao
    Zhou, Yong
    Du, Wen-liang
    Yao, Rui
    IMAGE AND VISION COMPUTING, 2024, 152
  • [43] Vision language model for interpretable and fine-grained detection of safety compliance in diverse workplaces
    Chen, Zhiling
    Chen, Hanning
    Imani, Mohsen
    Chen, Ruimin
    Imani, Farhad
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 265
  • [44] Hierarchical Vision-Language Pre-Training with Freezing Strategy for Multi-Level Semantic Alignment
    Xie, Huiming
    Qin, Yang
    Ding, Shuxue
    ELECTRONICS, 2025, 14 (04):
  • [45] Vision-Language Action Knowledge Learning for Semantic-Aware Action Quality Assessment
    Xu, Huangbiao
    Ke, Xiao
    Li, Yuezhou
    Xu, Rui
    Wu, Huanqi
    Lin, Xiaofeng
    Guo, Wenzhong
    COMPUTER VISION - ECCV 2024, PT XLII, 2025, 15100 : 423 - 440
  • [46] Vision-and-Language Navigation via Latent Semantic Alignment Learning
    Wu, Siying
    Fu, Xueyang
    Wu, Feng
    Zha, Zheng-Jun
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 8406 - 8418
  • [47] Fine-Grained 3D Modeling and Semantic Mapping of Coral Reefs Using Photogrammetric Computer Vision and Machine Learning
    Zhong, Jiageng
    Li, Ming
    Zhang, Hanqi
    Qin, Jiangying
    SENSORS, 2023, 23 (15)
  • [48] Learning Category-Specific Dictionary and Shared Dictionary for Fine-Grained Image Categorization
    Gao, Shenghua
    Tsang, Ivor Wai-Hung
    Ma, Yi
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2014, 23 (02) : 623 - 634
  • [49] Learning fine-grained representation with token-level alignment for multimodal sentiment analysis
    Li, Xiang
    Zhang, Haijun
    Dong, Zhiqiang
    Cheng, Xianfu
    Liu, Yun
    Zhang, Xiaoming
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 269
  • [50] From bird to sparrow: Learning-induced modulations in fine-grained semantic discrimination
    De Meo, Rosanna
    Bourquin, Nathalie M. -P.
    Knebel, Jean-Francois
    Murray, Micah M.
    Clarke, Stephanie
    NEUROIMAGE, 2015, 118 : 163 - 173