Learning to Embed Semantic Similarity for Joint Image-Text Retrieval

被引：6

作者：

Malali, Noam ^{[1
]}

Keller, Yosi ^{[1
]}

机构：

[1] Bar Ilan Univ, Fac Engn, IL-5290002 Ramat Gan, Israel

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2022年 / 44卷 / 12期

关键词：

Text and image fusion; deep learning; joint embedding;

D O I：

10.1109/TPAMI.2021.3132163

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present a deep learning approach for learning the joint semantic embeddings of images and captions in a euclidean space, such that the semantic similarity is approximated by the L-2 distances in the embedding space. For that, we introduce a metric learning scheme that utilizes multitask learning to learn the embedding of identical semantic concepts using a center loss. By introducing a differentiable quantization scheme into the end-to-end trainable network, we derive a semantic embedding of semantically similar concepts in euclidean space. We also propose a novel metric learning formulation using an adaptive margin hinge loss, that is refined during the training phase. The proposed scheme was applied to the MS-COCO, Flicke30K and Flickr8K datasets, and was shown to compare favorably with contemporary state-of-the-art approaches.

引用

页码：10252 / 10260

页数：9

共 50 条

[41] DEEP RANK CROSS-MODAL HASHING WITH SEMANTIC CONSISTENT FOR IMAGE-TEXT RETRIEVAL
Liu, Xiaoqing
Zeng, Huanqiang
Shi, Yifan
Zhu, Jianqing
Ma, Kai-Kuang
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2022, 2022-May : 4828 - 4832
[42] MISL: Multi-grained image-text semantic learning for text-guided image inpainting
Wu, Xingcai
Zhao, Kejun
Huang, Qianding
Wang, Qi
Yang, Zhenguo
Hao, Gefei
PATTERN RECOGNITION, 2024, 145
[43] Kernel triplet loss for image-text retrieval
Pan, Zhengxin
Wu, Fangyu
Zhang, Bailing
COMPUTER ANIMATION AND VIRTUAL WORLDS, 2022, 33 (3-4)
[44] Visual Contextual Semantic Reasoning for Cross-Modal Drone Image-Text Retrieval
Huang, Jinghao
Chen, Yaxiong
Xiong, Shengwu
Lu, Xiaoqiang
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
[45] Characterization and classification of semantic image-text relations
Christian Otto
Matthias Springstein
Avishek Anand
Ralph Ewerth
International Journal of Multimedia Information Retrieval, 2020, 9 : 31 - 45
[46] Mutil-level Local Alignment and Semantic Matching Network for Image-Text Retrieval
Jiang, Zhukai
Lian, Zhichao
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT III, 2022, 13531 : 212 - 224
[47] Reservoir Computing Transformer for Image-Text Retrieval
Li, Wenrui
Ma, Zhengyu
Deng, Liang-Jian
Wang, Penghong
Shi, Jinqiao
Fan, Xiaopeng
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5605 - 5613
[48] DEEP RANK CROSS-MODAL HASHING WITH SEMANTIC CONSISTENT FOR IMAGE-TEXT RETRIEVAL
Liu, Xiaoqing
Zeng, Huanqiang
Shi, Yifan
Zhu, Jianqing
Ma, Kai-Kuang
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4828 - 4832
[49] Dynamic Contrastive Distillation for Image-Text Retrieval
Rao, Jun
Ding, Liang
Qi, Shuhan
Fang, Meng
Liu, Yang
Shen, Li
Tao, Dacheng
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 8383 - 8395
[50] MKVSE: Multimodal Knowledge Enhanced Visual-semantic Embedding for Image-text Retrieval
Feng, Duoduo
He, Xiangteng
Peng, Yuxin
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (05)

← 1 2 3 4 5 →