The Style Transformer with Common Knowledge Optimization for Image-Text Retrieval

被引:2
|
作者
Li W. [1 ]
Ma Z. [2 ]
Shi J. [3 ]
Fan X. [1 ]
机构
[1] Harbin Institute of Technology, Department of Computer Science and Technology, Harbin
[2] Peng Cheng Laboratory, Shenzhen
[3] Beijing University of Posts and Telecommunications, School of Cyberspace Security, Beijing
关键词
Image-text retrieval; transformer;
D O I
10.1109/LSP.2023.3310870
中图分类号
学科分类号
摘要
Image-text retrieval which associates different modalities has drawn broad attention due to its excellent research value and broad real-world application. However, most of the existing methods haven't taken the high-level semantic relationships ('style embedding') and common knowledge from multi-modalities into full consideration. To this end, we introduce a novel style transformer network with common knowledge optimization (CKSTN) for image-text retrieval. The main module is the common knowledge adaptor (CKA) with both the style embedding extractor (SEE) and the common knowledge optimization (CKO) modules. Specifically, the SEE uses the sequential update strategy to effectively connect the features of different stages in SEE. The CKO module is introduced to dynamically capture the latent concepts of common knowledge from different modalities. Besides, to get generalized temporal common knowledge, we propose a sequential update strategy to effectively integrate the features of different layers in SEE with previous common feature units. CKSTN demonstrates the superiorities of the state-of-the-art methods in image-text retrieval on MSCOCO and Flickr30 K datasets. Moreover, CKSTN is constructed based on the lightweight transformer which is more convenient and practical for the application of real scenes, due to the better performance and lower parameters. © 1994-2012 IEEE.
引用
收藏
页码:1197 / 1201
页数:4
相关论文
共 50 条
  • [11] Compositional Learning of Image-Text Query for Image Retrieval
    Anwaar, Muhammad Umer
    Labintcev, Egor
    Kleinsteuber, Martin
    2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021), 2021, : 1139 - 1148
  • [12] Kernel triplet loss for image-text retrieval
    Pan, Zhengxin
    Wu, Fangyu
    Zhang, Bailing
    COMPUTER ANIMATION AND VIRTUAL WORLDS, 2022, 33 (3-4)
  • [13] Dynamic Contrastive Distillation for Image-Text Retrieval
    Rao, Jun
    Ding, Liang
    Qi, Shuhan
    Fang, Meng
    Liu, Yang
    Shen, Li
    Tao, Dacheng
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 8383 - 8395
  • [14] Semantic Completion and Filtration for Image-Text Retrieval
    Yang, Song
    Li, Qiang
    Li, Wenhui
    Li, Xuan-Ya
    Jin, Ran
    Lv, Bo
    Wang, Rui
    Liu, Anan
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (04)
  • [15] Spatial-Channel Attention Transformer With Pseudo Regions for Remote Sensing Image-Text Retrieval
    Wu, Dongqing
    Li, Huihui
    Hou, Yinxuan
    Xu, Cuili
    Cheng, Gong
    Guo, Lei
    Liu, Hang
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 15
  • [16] Image-text fusion transformer network for sarcasm detection
    Liu, Jing
    Tian, Shengwei
    Yu, Long
    Shi, Xianwei
    Wang, Fan
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (14) : 41895 - 41909
  • [17] MKVSE: Multimodal Knowledge Enhanced Visual-semantic Embedding for Image-text Retrieval
    Feng, Duoduo
    He, Xiangteng
    Peng, Yuxin
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (05)
  • [18] Multi-level knowledge-driven feature representation and triplet loss optimization network for image-text retrieval
    Qin, Xueyang
    Li, Lishang
    Hao, Fei
    Ge, Meiling
    Pang, Guangyao
    INFORMATION PROCESSING & MANAGEMENT, 2024, 61 (01)
  • [19] Image-text fusion transformer network for sarcasm detection
    Jing Liu
    Shengwei Tian
    Long Yu
    Xianwei Shi
    Fan Wang
    Multimedia Tools and Applications, 2024, 83 : 41895 - 41909
  • [20] Dynamic Modality Interaction Modeling for Image-Text Retrieval
    Qu, Leigang
    Liu, Meng
    Wu, Jianlong
    Gao, Zan
    Nie, Liqiang
    SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 1104 - 1113