Cross-modal recipe retrieval via parallel- and cross-attention networks learning

被引:10
|
作者
Cao, Da [1 ]
Chu, Jingjing [1 ]
Zhu, Ningbo [1 ]
Nie, Liqiang [2 ]
机构
[1] Hunan Univ, Coll Comp Sci & Elect Engn, Changsha 410082, Hunan, Peoples R China
[2] Shandong Univ, Sch Comp Sci & Technol, Qingdao 266000, Shandong, Peoples R China
基金
美国国家科学基金会; 中国国家自然科学基金;
关键词
Recipe retrieval; Parallel-attention network; Cross-attention network; Cross-modal retrieval;
D O I
10.1016/j.knosys.2019.105428
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cross-modal recipe retrieval refers to the problem of retrieving a food image from a list of image candidates given a textual recipe as the query, or the reverse side. However, existing cross-modal recipe retrieval approaches mostly focus on learning the representations of images and recipes independently and sewing them up by projecting them into a common space. Such methods overlook the interplay between images and recipes, resulting in the suboptimal retrieval performance. Toward this end, we study the problem of cross-modal recipe retrieval from the viewpoint of parallel- and cross-attention networks learning. Specifically, we first exploit a parallel-attention network to independently learn the attention weights of components in images and recipes. Thereafter, a cross-attention network is proposed to explicitly learn the interplay between images and recipes, which simultaneously considers word-guided image attention and image-guided word attention. Lastly, the learnt representations of images and recipes stemming from parallel- and cross-attention networks are elaborately connected and optimized using a pairwise ranking loss. By experimenting on two datasets, we demonstrate the effectiveness and rationality of our proposed solution on the scope of both overall performance comparison and micro-level analyses. (c) 2019 Published by Elsevier B.V.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] Semi-supervised cross-modal learning for cross modal retrieval and image annotation
    Fuhao Zou
    Xingqiang Bai
    Chaoyang Luan
    Kai Li
    Yunfei Wang
    Hefei Ling
    World Wide Web, 2019, 22 : 825 - 841
  • [32] Representation separation adversarial networks for cross-modal retrieval
    Deng, Jiaxin
    Ou, Weihua
    Gou, Jianping
    Song, Heping
    Wang, Anzhi
    Xu, Xing
    WIRELESS NETWORKS, 2024, 30 (05) : 3469 - 3481
  • [33] Separated Variational Hashing Networks for Cross-Modal Retrieval
    Hu, Peng
    Wang, Xu
    Zhen, Liangli
    Peng, Dezhong
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 1721 - 1729
  • [34] Semi-supervised cross-modal learning for cross modal retrieval and image annotation
    Zou, Fuhao
    Bai, Xingqiang
    Luan, Chaoyang
    Li, Kai
    Wang, Yunfei
    Ling, Hefei
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2019, 22 (02): : 825 - 841
  • [35] Adversarial Cross-Modal Retrieval
    Wang, Bokun
    Yang, Yang
    Xu, Xing
    Hanjalic, Alan
    Shen, Heng Tao
    PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 154 - 162
  • [36] BCAN: Bidirectional Correct Attention Network for Cross-Modal Retrieval
    Liu, Yang
    Liu, Hong
    Wang, Huaqiu
    Meng, Fanyang
    Liu, Mengyuan
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (10) : 14247 - 14258
  • [37] Deep semantic hashing with dual attention for cross-modal retrieval
    Jiagao Wu
    Weiwei Weng
    Junxia Fu
    Linfeng Liu
    Bin Hu
    Neural Computing and Applications, 2022, 34 : 5397 - 5416
  • [38] A novel deep translated attention hashing for cross-modal retrieval
    Haibo Yu
    Ran Ma
    Min Su
    Ping An
    Kai Li
    Multimedia Tools and Applications, 2022, 81 : 26443 - 26461
  • [39] Deep semantic hashing with dual attention for cross-modal retrieval
    Wu, Jiagao
    Weng, Weiwei
    Fu, Junxia
    Liu, Linfeng
    Hu, Bin
    NEURAL COMPUTING & APPLICATIONS, 2022, 34 (07): : 5397 - 5416
  • [40] A novel deep translated attention hashing for cross-modal retrieval
    Yu, Haibo
    Ma, Ran
    Su, Min
    An, Ping
    Li, Kai
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (18) : 26443 - 26461