Learning to Embed Semantic Similarity for Joint Image-Text Retrieval

被引:6
|
作者
Malali, Noam [1 ]
Keller, Yosi [1 ]
机构
[1] Bar Ilan Univ, Fac Engn, IL-5290002 Ramat Gan, Israel
关键词
Text and image fusion; deep learning; joint embedding;
D O I
10.1109/TPAMI.2021.3132163
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a deep learning approach for learning the joint semantic embeddings of images and captions in a euclidean space, such that the semantic similarity is approximated by the L-2 distances in the embedding space. For that, we introduce a metric learning scheme that utilizes multitask learning to learn the embedding of identical semantic concepts using a center loss. By introducing a differentiable quantization scheme into the end-to-end trainable network, we derive a semantic embedding of semantically similar concepts in euclidean space. We also propose a novel metric learning formulation using an adaptive margin hinge loss, that is refined during the training phase. The proposed scheme was applied to the MS-COCO, Flicke30K and Flickr8K datasets, and was shown to compare favorably with contemporary state-of-the-art approaches.
引用
收藏
页码:10252 / 10260
页数:9
相关论文
共 50 条
  • [31] Visual context learning based on textual knowledge for image-text retrieval
    Qin, Yuzhuo
    Gu, Xiaodong
    Tan, Zhenshan
    NEURAL NETWORKS, 2022, 152 : 434 - 449
  • [32] Scene Text Retrieval via Joint Text Detection and Similarity Learning
    Wang, Hao
    Bai, Xiang
    Yang, Mingkun
    Zhu, Shenggao
    Wang, Jing
    Liu, Wenyu
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 4556 - 4565
  • [33] Review of Recent Deep Learning Based Methods for Image-Text Retrieval
    Chen, Jianan
    Zhang, Lu
    Bai, Cong
    Kpalma, Kidiyo
    THIRD INTERNATIONAL CONFERENCE ON MULTIMEDIA INFORMATION PROCESSING AND RETRIEVAL (MIPR 2020), 2020, : 171 - 176
  • [34] Image-Text Embedding Learning via Visual and Textual Semantic Reasoning
    Li, Kunpeng
    Zhang, Yulun
    Li, Kai
    Li, Yuanyuan
    Fu, Yun
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (01) : 641 - 656
  • [35] Learning Image-Text Associations
    Jiang, Tao
    Tan, Ah-Hwee
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (02) : 161 - 177
  • [36] Learning Dual Semantic Relations With Graph Attention for Image-Text Matching
    Wen, Keyu
    Gu, Xiaodong
    Cheng, Qingrong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (07) : 2866 - 2879
  • [37] Webly Supervised Joint Embedding for Cross-Modal Image-Text Retrieval
    Mithun, Niluthpol Chowdhury
    Panda, Rameswar
    Papalexakis, Evangelos E.
    Roy-Chowdhury, Amit K.
    PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 1856 - 1864
  • [38] HAAN: Learning a Hierarchical Adaptive Alignment Network for Image-Text Retrieval
    Wang, Shuhuai
    Liu, Zheng
    Pei, Xinlei
    Xu, Junhao
    SENSORS, 2023, 23 (05)
  • [39] Regularizing Visual Semantic Embedding With Contrastive Learning for Image-Text Matching
    Liu, Yang
    Liu, Hong
    Wang, Huaqiu
    Liu, Mengyuan
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 1332 - 1336
  • [40] Similarity Reasoning and Filtration for Image-Text Matching
    Diao, Haiwen
    Zhang, Ying
    Ma, Lin
    Lu, Huchuan
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 1218 - 1226