Learning to Embed Semantic Similarity for Joint Image-Text Retrieval

被引：6

作者：

Malali, Noam ^{[1
]}

Keller, Yosi ^{[1
]}

机构：

[1] Bar Ilan Univ, Fac Engn, IL-5290002 Ramat Gan, Israel

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2022年 / 44卷 / 12期

关键词：

Text and image fusion; deep learning; joint embedding;

D O I：

10.1109/TPAMI.2021.3132163

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present a deep learning approach for learning the joint semantic embeddings of images and captions in a euclidean space, such that the semantic similarity is approximated by the L-2 distances in the embedding space. For that, we introduce a metric learning scheme that utilizes multitask learning to learn the embedding of identical semantic concepts using a center loss. By introducing a differentiable quantization scheme into the end-to-end trainable network, we derive a semantic embedding of semantically similar concepts in euclidean space. We also propose a novel metric learning formulation using an adaptive margin hinge loss, that is refined during the training phase. The proposed scheme was applied to the MS-COCO, Flicke30K and Flickr8K datasets, and was shown to compare favorably with contemporary state-of-the-art approaches.

引用

页码：10252 / 10260

页数：9

共 50 条

[1] Joint Image-text Representation Learning for Fashion Retrieval
Yan, Cairong
Li, Yu
Wan, Yongquan
Zhang, Zhaohui
ICMLC 2020: 2020 12TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND COMPUTING, 2018, : 412 - 417
[2] Multi-level similarity learning for image-text retrieval
Li, Wen-Hui
Yang, Song
Wang, Yan
Song, Dan
Li, Xuan-Ya
INFORMATION PROCESSING & MANAGEMENT, 2021, 58 (01)
[3] Enhanced Semantic Similarity Learning Framework for Image-Text Matching
Zhang, Kun
Hu, Bo
Zhang, Huatian
Li, Zhe
Mao, Zhendong
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (04) : 2973 - 2988
[4] Semantic Completion and Filtration for Image-Text Retrieval
Yang, Song
Li, Qiang
Li, Wenhui
Li, Xuan-Ya
Jin, Ran
Lv, Bo
Wang, Rui
Liu, Anan
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (04)
[5] Remote sensing image-text retrieval based on layout semantic joint representation
Zhang R.
Nie J.
Song N.
Zheng C.
Wei Z.
Beijing Hangkong Hangtian Daxue Xuebao/Journal of Beijing University of Aeronautics and Astronautics, 2024, 50 (02): : 671 - 683
[6] Learning Hierarchical Semantic Correspondences for Cross-Modal Image-Text Retrieval
Zeng, Sheng
Liu, Changhong
Zhou, Jun
Chen, Yong
Jiang, Aiwen
Li, Hanxi
PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2022, 2022, : 239 - 248
[7] Compositional Learning of Image-Text Query for Image Retrieval
Anwaar, Muhammad Umer
Labintcev, Egor
Kleinsteuber, Martin
2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021), 2021, : 1139 - 1148
[8] Scale-Semantic Joint Decoupling Network for Image-Text Retrieval in Remote Sensing
Zheng, Chengyu
Song, Ning
Zhang, Ruoyu
Huang, Lei
Wei, Zhiqiang
Nie, Jie
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (01)
[9] Cross-Modal Image-Text Retrieval with Semantic Consistency
Chen, Hui
Ding, Guiguang
Lin, Zijin
Zhao, Sicheng
Han, Jungong
PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 1749 - 1757
[10] Learning Multi-view Embedding in Joint Space for Bidirectional Image-Text Retrieval
Ran, Lu
Wang, Wenmin
2017 IEEE VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2017,

← 1 2 3 4 5 →