Cross-modal recipe retrieval via parallel- and cross-attention networks learning

被引：10

作者：

Cao, Da ^{[1
]}

Chu, Jingjing ^{[1
]}

Zhu, Ningbo ^{[1
]}

Nie, Liqiang ^{[2
]}

机构：

[1] Hunan Univ, Coll Comp Sci & Elect Engn, Changsha 410082, Hunan, Peoples R China

[2] Shandong Univ, Sch Comp Sci & Technol, Qingdao 266000, Shandong, Peoples R China

来源：

KNOWLEDGE-BASED SYSTEMS | 2020年 / 193卷

基金：

美国国家科学基金会; 中国国家自然科学基金;

关键词：

Recipe retrieval; Parallel-attention network; Cross-attention network; Cross-modal retrieval;

D O I：

10.1016/j.knosys.2019.105428

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Cross-modal recipe retrieval refers to the problem of retrieving a food image from a list of image candidates given a textual recipe as the query, or the reverse side. However, existing cross-modal recipe retrieval approaches mostly focus on learning the representations of images and recipes independently and sewing them up by projecting them into a common space. Such methods overlook the interplay between images and recipes, resulting in the suboptimal retrieval performance. Toward this end, we study the problem of cross-modal recipe retrieval from the viewpoint of parallel- and cross-attention networks learning. Specifically, we first exploit a parallel-attention network to independently learn the attention weights of components in images and recipes. Thereafter, a cross-attention network is proposed to explicitly learn the interplay between images and recipes, which simultaneously considers word-guided image attention and image-guided word attention. Lastly, the learnt representations of images and recipes stemming from parallel- and cross-attention networks are elaborately connected and optimized using a pairwise ranking loss. By experimenting on two datasets, we demonstrate the effectiveness and rationality of our proposed solution on the scope of both overall performance comparison and micro-level analyses. (c) 2019 Published by Elsevier B.V.

引用

页数：12

共 50 条

[21] Learning Cross-Modal Retrieval with Noisy Labels
Hu, Peng
Peng, Xi
Zhu, Hongyuan
Zhen, Liangli
Lin, Jie
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 5399 - 5409
[22] Hybrid representation learning for cross-modal retrieval
Cao, Wenming
Lin, Qiubin
He, Zhihai
He, Zhiquan
NEUROCOMPUTING, 2019, 345 : 45 - 57
[23] Multimodal Graph Learning for Cross-Modal Retrieval
Xie, Jingyou
Zhao, Zishuo
Lin, Zhenzhou
Shen, Ying
PROCEEDINGS OF THE 2023 SIAM INTERNATIONAL CONFERENCE ON DATA MINING, SDM, 2023, : 145 - 153
[24] Federated learning for supervised cross-modal retrieval
Li, Ang
Li, Yawen
Shao, Yingxia
WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2024, 27 (04):
[25] Parallel Pathways for Cross-Modal Memory Retrieval in Drosophila
Zhang, Xiaonan
Ren, Qingzhong
Guo, Aike
JOURNAL OF NEUROSCIENCE, 2013, 33 (20): : 8784 - 8793
[26] Real-world Cross-modal Retrieval via Sequential Learning
Song, Ge
Tan, Xiaoyang
IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 1708 - 1721
[27] Cross-Modal Search for Social Networks via Adversarial Learning
Zhou, Nan
Du, Junping
Xue, Zhe
Liu, Chong
Li, Jinxuan
COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2020, 2020
[28] Improving Cross-Modal Recipe Embeddings with Cross Decoder
Yang, Jing
Chen, Junwen
Yanai, Keiji
PROCEEDINGS OF THE 5TH ACM WORKSHOP ON INTELLIGENT CROSS-DATA ANALYSIS AND RETRIEVAL, ICDAR 2024, 2024, : 1 - 4
[29] ADVERSARIAL CROSS-MODAL RETRIEVAL VIA LEARNING AND TRANSFERRING SINGLE-MODAL SIMILARITIES
Wen, Xin
Han, Zhizhong
Yin, Xinyu
Liu, Yu-Shen
2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 478 - 483
[30] Learning TFIDF Enhanced Joint Embedding for Recipe-Image Cross-Modal Retrieval Service
Xie, Zhongwei
Liu, Ling
Wu, Yanzhao
Li, Lin
Zhong, Luo
IEEE TRANSACTIONS ON SERVICES COMPUTING, 2022, 15 (06) : 3304 - 3316

← 1 2 3 4 5 →