Learning to Embed Semantic Similarity for Joint Image-Text Retrieval

被引:6
|
作者
Malali, Noam [1 ]
Keller, Yosi [1 ]
机构
[1] Bar Ilan Univ, Fac Engn, IL-5290002 Ramat Gan, Israel
关键词
Text and image fusion; deep learning; joint embedding;
D O I
10.1109/TPAMI.2021.3132163
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a deep learning approach for learning the joint semantic embeddings of images and captions in a euclidean space, such that the semantic similarity is approximated by the L-2 distances in the embedding space. For that, we introduce a metric learning scheme that utilizes multitask learning to learn the embedding of identical semantic concepts using a center loss. By introducing a differentiable quantization scheme into the end-to-end trainable network, we derive a semantic embedding of semantically similar concepts in euclidean space. We also propose a novel metric learning formulation using an adaptive margin hinge loss, that is refined during the training phase. The proposed scheme was applied to the MS-COCO, Flicke30K and Flickr8K datasets, and was shown to compare favorably with contemporary state-of-the-art approaches.
引用
收藏
页码:10252 / 10260
页数:9
相关论文
共 50 条
  • [21] Cross-modal Image-Text Retrieval with Multitask Learning
    Luo, Junyu
    Shen, Ying
    Ao, Xiang
    Zhao, Zhou
    Yang, Min
    PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM '19), 2019, : 2309 - 2312
  • [22] Joint feature approach for image-text cross-modal retrieval
    Gao, Dihui
    Sheng, Lijie
    Xu, Xiaodong
    Miao, Qiguang
    Xi'an Dianzi Keji Daxue Xuebao/Journal of Xidian University, 2024, 51 (04): : 128 - 138
  • [23] Joint Intra & Inter-Grained Reasoning: A New Look Into Semantic Consistency of Image-Text Retrieval
    Pan, Renjie
    Yang, Hua
    Li, Cunyan
    Yang, Jinhai
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 4912 - 4925
  • [24] Context-aware relation enhancement and similarity reasoning for image-text retrieval
    Cui, Zheng
    Hu, Yongli
    Sun, Yanfeng
    Yin, Baocai
    IET COMPUTER VISION, 2024, 18 (05) : 652 - 665
  • [25] Multi-Task Visual Semantic Embedding Network for Image-Text Retrieval
    Qin, Xue-Yang
    Li, Li-Shuang
    Tang, Jing-Yao
    Hao, Fei
    Ge, Mei-Ling
    Pang, Guang-Yao
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2024, 39 (04) : 811 - 826
  • [26] SAM: cross-modal semantic alignments module for image-text retrieval
    Pilseo Park
    Soojin Jang
    Yunsung Cho
    Youngbin Kim
    Multimedia Tools and Applications, 2024, 83 : 12363 - 12377
  • [27] SAM: cross-modal semantic alignments module for image-text retrieval
    Park, Pilseo
    Jang, Soojin
    Cho, Yunsung
    Kim, Youngbin
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (04) : 12363 - 12377
  • [28] Multi-view and region reasoning semantic enhancement for image-text retrieval
    Cheng, Wengang
    Han, Ziyi
    He, Di
    Wu, Lifang
    MULTIMEDIA SYSTEMS, 2024, 30 (04)
  • [29] Entity Semantic Feature Fusion Network for Remote Sensing Image-Text Retrieval
    Shui, Jianan
    Ding, Shuaipeng
    Li, Mingyong
    Ma, Yan
    WEB AND BIG DATA, APWEB-WAIM 2024, PT V, 2024, 14965 : 130 - 145
  • [30] JECL: Joint Embedding and Cluster Learning for Image-Text Pairs
    Yang, Sean T.
    Huang, Kuan-Hao
    Howe, Bill
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 8344 - 8351