Cross-modal recipe retrieval via parallel- and cross-attention networks learning

被引:10
|
作者
Cao, Da [1 ]
Chu, Jingjing [1 ]
Zhu, Ningbo [1 ]
Nie, Liqiang [2 ]
机构
[1] Hunan Univ, Coll Comp Sci & Elect Engn, Changsha 410082, Hunan, Peoples R China
[2] Shandong Univ, Sch Comp Sci & Technol, Qingdao 266000, Shandong, Peoples R China
基金
美国国家科学基金会; 中国国家自然科学基金;
关键词
Recipe retrieval; Parallel-attention network; Cross-attention network; Cross-modal retrieval;
D O I
10.1016/j.knosys.2019.105428
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cross-modal recipe retrieval refers to the problem of retrieving a food image from a list of image candidates given a textual recipe as the query, or the reverse side. However, existing cross-modal recipe retrieval approaches mostly focus on learning the representations of images and recipes independently and sewing them up by projecting them into a common space. Such methods overlook the interplay between images and recipes, resulting in the suboptimal retrieval performance. Toward this end, we study the problem of cross-modal recipe retrieval from the viewpoint of parallel- and cross-attention networks learning. Specifically, we first exploit a parallel-attention network to independently learn the attention weights of components in images and recipes. Thereafter, a cross-attention network is proposed to explicitly learn the interplay between images and recipes, which simultaneously considers word-guided image attention and image-guided word attention. Lastly, the learnt representations of images and recipes stemming from parallel- and cross-attention networks are elaborately connected and optimized using a pairwise ranking loss. By experimenting on two datasets, we demonstrate the effectiveness and rationality of our proposed solution on the scope of both overall performance comparison and micro-level analyses. (c) 2019 Published by Elsevier B.V.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] Learning Cross-Modal Retrieval with Noisy Labels
    Hu, Peng
    Peng, Xi
    Zhu, Hongyuan
    Zhen, Liangli
    Lin, Jie
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 5399 - 5409
  • [22] Hybrid representation learning for cross-modal retrieval
    Cao, Wenming
    Lin, Qiubin
    He, Zhihai
    He, Zhiquan
    NEUROCOMPUTING, 2019, 345 : 45 - 57
  • [23] Multimodal Graph Learning for Cross-Modal Retrieval
    Xie, Jingyou
    Zhao, Zishuo
    Lin, Zhenzhou
    Shen, Ying
    PROCEEDINGS OF THE 2023 SIAM INTERNATIONAL CONFERENCE ON DATA MINING, SDM, 2023, : 145 - 153
  • [24] Federated learning for supervised cross-modal retrieval
    Li, Ang
    Li, Yawen
    Shao, Yingxia
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2024, 27 (04):
  • [25] Parallel Pathways for Cross-Modal Memory Retrieval in Drosophila
    Zhang, Xiaonan
    Ren, Qingzhong
    Guo, Aike
    JOURNAL OF NEUROSCIENCE, 2013, 33 (20): : 8784 - 8793
  • [26] Real-world Cross-modal Retrieval via Sequential Learning
    Song, Ge
    Tan, Xiaoyang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 1708 - 1721
  • [27] Cross-Modal Search for Social Networks via Adversarial Learning
    Zhou, Nan
    Du, Junping
    Xue, Zhe
    Liu, Chong
    Li, Jinxuan
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2020, 2020
  • [28] Improving Cross-Modal Recipe Embeddings with Cross Decoder
    Yang, Jing
    Chen, Junwen
    Yanai, Keiji
    PROCEEDINGS OF THE 5TH ACM WORKSHOP ON INTELLIGENT CROSS-DATA ANALYSIS AND RETRIEVAL, ICDAR 2024, 2024, : 1 - 4
  • [29] ADVERSARIAL CROSS-MODAL RETRIEVAL VIA LEARNING AND TRANSFERRING SINGLE-MODAL SIMILARITIES
    Wen, Xin
    Han, Zhizhong
    Yin, Xinyu
    Liu, Yu-Shen
    2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 478 - 483
  • [30] Learning TFIDF Enhanced Joint Embedding for Recipe-Image Cross-Modal Retrieval Service
    Xie, Zhongwei
    Liu, Ling
    Wu, Yanzhao
    Li, Lin
    Zhong, Luo
    IEEE TRANSACTIONS ON SERVICES COMPUTING, 2022, 15 (06) : 3304 - 3316