Learning to Embed Semantic Similarity for Joint Image-Text Retrieval

被引:6
|
作者
Malali, Noam [1 ]
Keller, Yosi [1 ]
机构
[1] Bar Ilan Univ, Fac Engn, IL-5290002 Ramat Gan, Israel
关键词
Text and image fusion; deep learning; joint embedding;
D O I
10.1109/TPAMI.2021.3132163
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a deep learning approach for learning the joint semantic embeddings of images and captions in a euclidean space, such that the semantic similarity is approximated by the L-2 distances in the embedding space. For that, we introduce a metric learning scheme that utilizes multitask learning to learn the embedding of identical semantic concepts using a center loss. By introducing a differentiable quantization scheme into the end-to-end trainable network, we derive a semantic embedding of semantically similar concepts in euclidean space. We also propose a novel metric learning formulation using an adaptive margin hinge loss, that is refined during the training phase. The proposed scheme was applied to the MS-COCO, Flicke30K and Flickr8K datasets, and was shown to compare favorably with contemporary state-of-the-art approaches.
引用
收藏
页码:10252 / 10260
页数:9
相关论文
共 50 条
  • [41] DEEP RANK CROSS-MODAL HASHING WITH SEMANTIC CONSISTENT FOR IMAGE-TEXT RETRIEVAL
    Liu, Xiaoqing
    Zeng, Huanqiang
    Shi, Yifan
    Zhu, Jianqing
    Ma, Kai-Kuang
    ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2022, 2022-May : 4828 - 4832
  • [42] MISL: Multi-grained image-text semantic learning for text-guided image inpainting
    Wu, Xingcai
    Zhao, Kejun
    Huang, Qianding
    Wang, Qi
    Yang, Zhenguo
    Hao, Gefei
    PATTERN RECOGNITION, 2024, 145
  • [43] Kernel triplet loss for image-text retrieval
    Pan, Zhengxin
    Wu, Fangyu
    Zhang, Bailing
    COMPUTER ANIMATION AND VIRTUAL WORLDS, 2022, 33 (3-4)
  • [44] Visual Contextual Semantic Reasoning for Cross-Modal Drone Image-Text Retrieval
    Huang, Jinghao
    Chen, Yaxiong
    Xiong, Shengwu
    Lu, Xiaoqiang
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [45] Characterization and classification of semantic image-text relations
    Christian Otto
    Matthias Springstein
    Avishek Anand
    Ralph Ewerth
    International Journal of Multimedia Information Retrieval, 2020, 9 : 31 - 45
  • [46] Mutil-level Local Alignment and Semantic Matching Network for Image-Text Retrieval
    Jiang, Zhukai
    Lian, Zhichao
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT III, 2022, 13531 : 212 - 224
  • [47] Reservoir Computing Transformer for Image-Text Retrieval
    Li, Wenrui
    Ma, Zhengyu
    Deng, Liang-Jian
    Wang, Penghong
    Shi, Jinqiao
    Fan, Xiaopeng
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5605 - 5613
  • [48] DEEP RANK CROSS-MODAL HASHING WITH SEMANTIC CONSISTENT FOR IMAGE-TEXT RETRIEVAL
    Liu, Xiaoqing
    Zeng, Huanqiang
    Shi, Yifan
    Zhu, Jianqing
    Ma, Kai-Kuang
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4828 - 4832
  • [49] Dynamic Contrastive Distillation for Image-Text Retrieval
    Rao, Jun
    Ding, Liang
    Qi, Shuhan
    Fang, Meng
    Liu, Yang
    Shen, Li
    Tao, Dacheng
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 8383 - 8395
  • [50] MKVSE: Multimodal Knowledge Enhanced Visual-semantic Embedding for Image-text Retrieval
    Feng, Duoduo
    He, Xiangteng
    Peng, Yuxin
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (05)