The Style Transformer with Common Knowledge Optimization for Image-Text Retrieval

被引:2
|
作者
Li W. [1 ]
Ma Z. [2 ]
Shi J. [3 ]
Fan X. [1 ]
机构
[1] Harbin Institute of Technology, Department of Computer Science and Technology, Harbin
[2] Peng Cheng Laboratory, Shenzhen
[3] Beijing University of Posts and Telecommunications, School of Cyberspace Security, Beijing
关键词
Image-text retrieval; transformer;
D O I
10.1109/LSP.2023.3310870
中图分类号
学科分类号
摘要
Image-text retrieval which associates different modalities has drawn broad attention due to its excellent research value and broad real-world application. However, most of the existing methods haven't taken the high-level semantic relationships ('style embedding') and common knowledge from multi-modalities into full consideration. To this end, we introduce a novel style transformer network with common knowledge optimization (CKSTN) for image-text retrieval. The main module is the common knowledge adaptor (CKA) with both the style embedding extractor (SEE) and the common knowledge optimization (CKO) modules. Specifically, the SEE uses the sequential update strategy to effectively connect the features of different stages in SEE. The CKO module is introduced to dynamically capture the latent concepts of common knowledge from different modalities. Besides, to get generalized temporal common knowledge, we propose a sequential update strategy to effectively integrate the features of different layers in SEE with previous common feature units. CKSTN demonstrates the superiorities of the state-of-the-art methods in image-text retrieval on MSCOCO and Flickr30 K datasets. Moreover, CKSTN is constructed based on the lightweight transformer which is more convenient and practical for the application of real scenes, due to the better performance and lower parameters. © 1994-2012 IEEE.
引用
收藏
页码:1197 / 1201
页数:4
相关论文
共 50 条
  • [31] Dual Stream Relation Learning Network for Image-Text Retrieval
    Wu, Dongqing
    Li, Huihui
    Gu, Cang
    Guo, Lei
    Liu, Hang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2025, 27 : 1551 - 1565
  • [32] Dissecting Deep Metric Learning Losses for Image-Text Retrieval
    Xuan, Hong
    Chen, Xi
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 2163 - 2172
  • [33] Cross-modal Image-Text Retrieval with Multitask Learning
    Luo, Junyu
    Shen, Ying
    Ao, Xiang
    Zhao, Zhou
    Yang, Min
    PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM '19), 2019, : 2309 - 2312
  • [34] Learning to Embed Semantic Similarity for Joint Image-Text Retrieval
    Malali, Noam
    Keller, Yosi
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (12) : 10252 - 10260
  • [35] Multi-level similarity learning for image-text retrieval
    Li, Wen-Hui
    Yang, Song
    Wang, Yan
    Song, Dan
    Li, Xuan-Ya
    INFORMATION PROCESSING & MANAGEMENT, 2021, 58 (01)
  • [36] A TRANSFORMER-BASED CROSS-MODAL IMAGE-TEXT RETRIEVAL METHOD USING FEATURE DECOUPLING AND RECONSTRUCTION
    Zhang, Huan
    Sun, Yingzhi
    Liao, Yu
    Xu, SiYuan
    Yang, Rui
    Wang, Shuang
    Hou, Biao
    Jiao, Licheng
    2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 1796 - 1799
  • [37] Scene Graph based Fusion Network for Image-Text Retrieval
    Wang, Guoliang
    Shang, Yanlei
    Chen, Yong
    Zhen, Chaoqi
    Cheng, Dequan
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 138 - 143
  • [38] HGAN: Hierarchical Graph Alignment Network for Image-Text Retrieval
    Guo, Jie
    Wang, Meiting
    Zhou, Yan
    Song, Bin
    Chi, Yuhao
    Fan, Wei
    Chang, Jianglong
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 9189 - 9202
  • [39] INTRA-MODAL CONSTRAINT LOSS FOR IMAGE-TEXT RETRIEVAL
    Chen, Jianan
    Zhang, Lu
    Wang, Qiong
    Bai, Cong
    Kpalma, Kidiyo
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 4023 - 4027
  • [40] ESA: External Space Attention Aggregation for Image-Text Retrieval
    Zhu, Hongguang
    Zhang, Chunjie
    Wei, Yunchao
    Huang, Shujuan
    Zhao, Yao
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (10) : 6131 - 6143