Cross-modal recipe retrieval via parallel- and cross-attention networks learning

被引：10

作者：

Cao, Da ^{[1
]}

Chu, Jingjing ^{[1
]}

Zhu, Ningbo ^{[1
]}

Nie, Liqiang ^{[2
]}

机构：

[1] Hunan Univ, Coll Comp Sci & Elect Engn, Changsha 410082, Hunan, Peoples R China

[2] Shandong Univ, Sch Comp Sci & Technol, Qingdao 266000, Shandong, Peoples R China

来源：

KNOWLEDGE-BASED SYSTEMS | 2020年 / 193卷

基金：

美国国家科学基金会; 中国国家自然科学基金;

关键词：

Recipe retrieval; Parallel-attention network; Cross-attention network; Cross-modal retrieval;

D O I：

10.1016/j.knosys.2019.105428

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Cross-modal recipe retrieval refers to the problem of retrieving a food image from a list of image candidates given a textual recipe as the query, or the reverse side. However, existing cross-modal recipe retrieval approaches mostly focus on learning the representations of images and recipes independently and sewing them up by projecting them into a common space. Such methods overlook the interplay between images and recipes, resulting in the suboptimal retrieval performance. Toward this end, we study the problem of cross-modal recipe retrieval from the viewpoint of parallel- and cross-attention networks learning. Specifically, we first exploit a parallel-attention network to independently learn the attention weights of components in images and recipes. Thereafter, a cross-attention network is proposed to explicitly learn the interplay between images and recipes, which simultaneously considers word-guided image attention and image-guided word attention. Lastly, the learnt representations of images and recipes stemming from parallel- and cross-attention networks are elaborately connected and optimized using a pairwise ranking loss. By experimenting on two datasets, we demonstrate the effectiveness and rationality of our proposed solution on the scope of both overall performance comparison and micro-level analyses. (c) 2019 Published by Elsevier B.V.

引用

页数：12

共 50 条

[31] Semi-supervised cross-modal learning for cross modal retrieval and image annotation
Fuhao Zou
Xingqiang Bai
Chaoyang Luan
Kai Li
Yunfei Wang
Hefei Ling
World Wide Web, 2019, 22 : 825 - 841
[32] Representation separation adversarial networks for cross-modal retrieval
Deng, Jiaxin
Ou, Weihua
Gou, Jianping
Song, Heping
Wang, Anzhi
Xu, Xing
WIRELESS NETWORKS, 2024, 30 (05) : 3469 - 3481
[33] Separated Variational Hashing Networks for Cross-Modal Retrieval
Hu, Peng
Wang, Xu
Zhen, Liangli
Peng, Dezhong
PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 1721 - 1729
[34] Semi-supervised cross-modal learning for cross modal retrieval and image annotation
Zou, Fuhao
Bai, Xingqiang
Luan, Chaoyang
Li, Kai
Wang, Yunfei
Ling, Hefei
WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2019, 22 (02): : 825 - 841
[35] Adversarial Cross-Modal Retrieval
Wang, Bokun
Yang, Yang
Xu, Xing
Hanjalic, Alan
Shen, Heng Tao
PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 154 - 162
[36] BCAN: Bidirectional Correct Attention Network for Cross-Modal Retrieval
Liu, Yang
Liu, Hong
Wang, Huaqiu
Meng, Fanyang
Liu, Mengyuan
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (10) : 14247 - 14258
[37] Deep semantic hashing with dual attention for cross-modal retrieval
Jiagao Wu
Weiwei Weng
Junxia Fu
Linfeng Liu
Bin Hu
Neural Computing and Applications, 2022, 34 : 5397 - 5416
[38] A novel deep translated attention hashing for cross-modal retrieval
Haibo Yu
Ran Ma
Min Su
Ping An
Kai Li
Multimedia Tools and Applications, 2022, 81 : 26443 - 26461
[39] Deep semantic hashing with dual attention for cross-modal retrieval
Wu, Jiagao
Weng, Weiwei
Fu, Junxia
Liu, Linfeng
Hu, Bin
NEURAL COMPUTING & APPLICATIONS, 2022, 34 (07): : 5397 - 5416
[40] A novel deep translated attention hashing for cross-modal retrieval
Yu, Haibo
Ma, Ran
Su, Min
An, Ping
Li, Kai
MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (18) : 26443 - 26461

← 1 2 3 4 5 →