Dual-grained Text-Image Olfactory Matching Model with Mutual Promotion Stages

被引:1
|
作者
Shao, Yi [1 ]
Sun, Jiande [1 ]
Jiang, Ye [1 ]
Li, Jing [2 ]
机构
[1] Shandong Normal Univ, Sch Informat Sci & Engn, Jinan, Shandong, Peoples R China
[2] Shandong Normal Univ, Sch Journalism & Commun, Jinan, Shandong, Peoples R China
来源
COMPANION OF THE WORLD WIDE WEB CONFERENCE, WWW 2023 | 2023年
关键词
multimodal; text-image matching; olfactory representation; coarse-grained; fne-grained; cross-modal attention; focal loss;
D O I
10.1145/3543873.3587649
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Olfactory experience has great advantages in awakening human memories and emotions, which may even surpass vision in some cases. Studies have proved that olfactory scene descriptions in images and text content can also arouse human olfactory imagination, but there are still few studies on solving related problems from the perspective of computer vision and NLP. This paper proposes a multimodal model that can detect similar olfactory experience in paired text-image samples. The model builds two stages, coarse-grained and fine-grained. The model adopts the feature fusion method based on pre-trained CLIP for coarse-grained matching training to obtain a preliminary feature extractor to promote fine-grained matching training, and then uses the similarity calculation method based on stacked cross attention for fine-grained matching training to obtain the final feature extractor which in turn promotes coarse-grained matching training. Finally, we manually build an approximate olfactory nouns list during fine-grained matching training, which not only yields significantly better performance when fed back to the fine-grained matching process, but this noun list can be used for future research. Experiments on the MUSTI task dataset of MediaEval2022 prove that the coarse-grained and fine-grained matching stages in proposed model both perform well, and both F1 measures exceed the existing baseline models.
引用
收藏
页码:669 / 677
页数:9
相关论文
共 50 条
  • [41] ALADIN: Distilling Fine-grained Alignment Scores for Efficient Image-Text Matching and Retrieval
    Messina, Nicola
    Stefanini, Matteo
    Cornia, Marcella
    Baraldi, Lorenzo
    Falchi, Fabrizio
    Amato, Giuseppe
    Cucchiara, Rita
    19TH INTERNATIONAL CONFERENCE ON CONTENT-BASED MULTIMEDIA INDEXING, CBMI 2022, 2022, : 64 - 70
  • [42] Multi-level network based on transformer encoder for fine-grained image–text matching
    Lei Yang
    Yong Feng
    Mingliang Zhou
    Xiancai Xiong
    Yongheng Wang
    Baohua Qiang
    Multimedia Systems, 2023, 29 : 1981 - 1994
  • [43] Location Attention Knowledge Embedding Model for Image-Text Matching
    Xu, Guoqing
    Hu, Min
    Wang, Xiaohua
    Yang, Jiaoyun
    Li, Nan
    Zhang, Qingyu
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT I, 2024, 14425 : 408 - 421
  • [44] Image-Text Matching Model Based on CLIP Bimodal Encoding
    Zhu, Yihuan
    Xu, Honghua
    Du, Ailin
    Wang, Bin
    APPLIED SCIENCES-BASEL, 2024, 14 (22):
  • [45] Text image matching without language model using a Hausdorff distance
    Son, Hwa-Jeong
    Kim, Soo-Hyung
    Kim, Ji-Soo
    INFORMATION PROCESSING & MANAGEMENT, 2008, 44 (03) : 1189 - 1200
  • [46] Image-Text Dual Model for Small-Sample Image Classification
    Zhu, Fangyi
    Li, Xiaoxu
    Ma, Zhanyu
    Chen, Guang
    Peng, Pai
    Guo, Xiaowei
    Chien, Jen-Tzung
    Guo, Jun
    COMPUTER VISION, PT II, 2017, 772 : 556 - 565
  • [47] Dual Relation-Aware Synergistic Attention Network for Image-Text Matching
    Qi, Shanshan
    Yang, Luxi
    Li, Chunguo
    Huang, Yongming
    2022 11TH INTERNATIONAL CONFERENCE ON COMMUNICATIONS, CIRCUITS AND SYSTEMS (ICCCAS 2022), 2022, : 251 - 256
  • [48] Multi-level network based on transformer encoder for fine-grained image-text matching
    Yang, Lei
    Feng, Yong
    Zhou, Mingliang
    Xiong, Xiancai
    Wang, Yongheng
    Qiang, Baohua
    MULTIMEDIA SYSTEMS, 2023, 29 (04) : 1981 - 1994
  • [49] VSR plus plus : Improving Visual Semantic Reasoning for Fine-Grained Image-Text Matching
    Yuan, Hui
    Huang, Yan
    Zhang, Dongbo
    Chen, Zerui
    Cheng, Wenlong
    Wang, Liang
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 3728 - 3735
  • [50] Fine-grained text and image guided point cloud completion with CLIP model
    Zhou, Jun
    Song, Wei
    Wang, Mingjie
    Tan, Hongchen
    Li, Nannan
    Liu, Xiuping
    NEUROCOMPUTING, 2025, 631