Cross-modal recipe retrieval via parallel- and cross-attention networks learning

被引:10
|
作者
Cao, Da [1 ]
Chu, Jingjing [1 ]
Zhu, Ningbo [1 ]
Nie, Liqiang [2 ]
机构
[1] Hunan Univ, Coll Comp Sci & Elect Engn, Changsha 410082, Hunan, Peoples R China
[2] Shandong Univ, Sch Comp Sci & Technol, Qingdao 266000, Shandong, Peoples R China
基金
美国国家科学基金会; 中国国家自然科学基金;
关键词
Recipe retrieval; Parallel-attention network; Cross-attention network; Cross-modal retrieval;
D O I
10.1016/j.knosys.2019.105428
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cross-modal recipe retrieval refers to the problem of retrieving a food image from a list of image candidates given a textual recipe as the query, or the reverse side. However, existing cross-modal recipe retrieval approaches mostly focus on learning the representations of images and recipes independently and sewing them up by projecting them into a common space. Such methods overlook the interplay between images and recipes, resulting in the suboptimal retrieval performance. Toward this end, we study the problem of cross-modal recipe retrieval from the viewpoint of parallel- and cross-attention networks learning. Specifically, we first exploit a parallel-attention network to independently learn the attention weights of components in images and recipes. Thereafter, a cross-attention network is proposed to explicitly learn the interplay between images and recipes, which simultaneously considers word-guided image attention and image-guided word attention. Lastly, the learnt representations of images and recipes stemming from parallel- and cross-attention networks are elaborately connected and optimized using a pairwise ranking loss. By experimenting on two datasets, we demonstrate the effectiveness and rationality of our proposed solution on the scope of both overall performance comparison and micro-level analyses. (c) 2019 Published by Elsevier B.V.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Cross-modal recipe retrieval with stacked attention model
    Jing-Jing Chen
    Lei Pang
    Chong-Wah Ngo
    Multimedia Tools and Applications, 2018, 77 : 29457 - 29473
  • [2] Cross-modal recipe retrieval with stacked attention model
    Chen, Jing-Jing
    Pang, Lei
    Ngo, Chong-Wah
    MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (22) : 29457 - 29473
  • [3] Revamping Image-Recipe Cross-Modal Retrieval with Dual Cross Attention Encoders
    Liu, Wenhao
    Yuan, Simiao
    Wang, Zhen
    Chang, Xinyi
    Gao, Limeng
    Zhang, Zhenrui
    MATHEMATICS, 2024, 12 (20)
  • [4] PBLF: Prompt Based Learning Framework for Cross-Modal Recipe Retrieval
    Sun, Jialiang
    Li, Jiao
    ARTIFICIAL INTELLIGENCE AND ROBOTICS, ISAIR 2022, PT I, 2022, 1700 : 388 - 402
  • [5] Video-Based Cross-Modal Recipe Retrieval
    Cao, Da
    Yu, Zhiwang
    Zhang, Hanling
    Fang, Jiansheng
    Nie, Liqiang
    Tian, Qi
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 1685 - 1693
  • [6] Cross-modal Recipe Retrieval with Rich Food Attributes
    Chen, Jing-Jing
    Ngo, Chong-Wah
    Chua, Tat-Seng
    PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 1771 - 1779
  • [7] Cross-Modal Recipe Retrieval: How to Cook this Dish?
    Chen, Jingjing
    Pang, Lei
    Ngo, Chong-Wah
    MULTIMEDIA MODELING (MMM 2017), PT I, 2017, 10132 : 588 - 600
  • [8] HCMSL: Hybrid Cross-modal Similarity Learning for Cross-modal Retrieval
    Zhang, Chengyuan
    Song, Jiayu
    Zhu, Xiaofeng
    Zhu, Lei
    Zhang, Shichao
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2021, 17 (01)
  • [9] Continual learning in cross-modal retrieval
    Wang, Kai
    Herranz, Luis
    van de Weijer, Joost
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 3623 - 3633
  • [10] Learning DALTS for cross-modal retrieval
    Yu, Zheng
    Wang, Wenmin
    CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2019, 4 (01) : 9 - 16