Combining Global and Local Similarity for Cross-Media Retrieval

被引：20

作者：

Li, Zhixin ^{[1
]}

Ling, Feng ^{[1
]}

Zhang, Canlong ^{[1
]}

Ma, Huifang ^{[2
]}

机构：

[1] Guangxi Normal Univ, Guangxi Key Lab Multisource Informat Min & Secur, Guilin 541004, Peoples R China

[2] Northwest Normal Univ, Coll Comp Sci & Engn, Lanzhou 730070, Peoples R China

来源：

IEEE ACCESS | 2020年 / 8卷 / 08期

基金：

中国国家自然科学基金;

关键词：

Convolutional neural network; self-attention network; attention mechanism; two-level network; cross-media retrieval;

D O I：

10.1109/ACCESS.2020.2969808

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper mainly studies the problem of image-text matching in order to make image and text better match. Existing cross-media retrieval methods only make use of the information of image and part of text, that is, matching the whole image with the whole sentence, or matching some image areas with some words. In order to better reveal the potential connection between image and text semantics, this paper proposes a fusion of two levels of similarity across media images-text retrieval method, constructed the cross-media two-level network to explore the better matching between images and texts, it contains two subnets for dealing with global features and local characteristics. Specifically, in this method, the image is divided into the whole picture and some image area, the text is divided into the whole sentences and words, to study respectively, to explore the full potential alignment of images and text, and then use a two-level alignment framework is used to promote each other, fusion of two kinds of similarity can learn to complete representation of cross-media retrieval. Through the experimental evaluation on Flickr30K and MS-COCO datasets, the results show that the method in this paper can make the semantic matching of image and text more accurate, and is superior to the international popular cross-media retrieval method in various evaluation indexes.

引用

页码：21847 / 21856

页数：10

共 50 条

[21] CROSS-MODALITY CORRELATION PROPAGATION FOR CROSS-MEDIA RETRIEVAL
Zhai, Xiaohua
Peng, Yuxin
Xiao, Jianguo
2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 2337 - 2340
[22] Toward cross-language and cross-media image retrieval
Alvarez, C
Oumohmed, AI
Mignotte, M
Nie, JY
MULTILINGUAL INFORMATION ACCESS FOR TEXT, SPEECH AND IMAGES, 2005, 3491 : 676 - 687
[23] LEARNING OPTIMAL DATA REPRESENTATION FOR CROSS-MEDIA RETRIEVAL
Zhang, Hong
Chen, Li
2012 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2012), 2012, : 1925 - 1928
[24] Internet cross-media retrieval based on deep learning
Jiang, Bin
Yang, Jiachen
Lv, Zhihan
Tian, Kun
Meng, Qinggang
Yan, Yan
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2017, 48 : 356 - 366
[25] Bagging-based cross-media retrieval algorithm
Xu, Gongwen
Zhang, Yu
Yin, Mingshan
Hong, Wenzhong
Zou, Ran
Wang, Shanshan
SOFT COMPUTING, 2023, 27 (05) : 2615 - 2623
[26] Complementary information retrieval for cross-media news content
Ma, Qiang
Nadamoto, Akiyo
Tanaka, Katsumi
INFORMATION SYSTEMS, 2006, 31 (07) : 659 - 678
[27] Cross-media retrieval with collective deep semantic learning
Bin Zhang
Lei Zhu
Jiande Sun
Huaxiang Zhang
Multimedia Tools and Applications, 2018, 77 : 22247 - 22266
[28] Cross-media retrieval based on linear discriminant analysis
Qi, Yudan
Zhang, Huaxiang
Zhang, Bin
Wang, Li
Zheng, Shunxin
MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (17) : 24249 - 24268
[29] Cross-media retrieval based on synthesis reasoning model
College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China
Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao, 2009, 9 (1307-1314):
[30] Finding the best picture: Cross-media retrieval of content
Deschacht, Koen
Moens, Marie-Francine
ADVANCES IN INFORMATION RETRIEVAL, 2008, 4956 : 539 - 546

← 1 2 3 4 5 →