Dual-grained Text-Image Olfactory Matching Model with Mutual Promotion Stages

被引:1
|
作者
Shao, Yi [1 ]
Sun, Jiande [1 ]
Jiang, Ye [1 ]
Li, Jing [2 ]
机构
[1] Shandong Normal Univ, Sch Informat Sci & Engn, Jinan, Shandong, Peoples R China
[2] Shandong Normal Univ, Sch Journalism & Commun, Jinan, Shandong, Peoples R China
来源
COMPANION OF THE WORLD WIDE WEB CONFERENCE, WWW 2023 | 2023年
关键词
multimodal; text-image matching; olfactory representation; coarse-grained; fne-grained; cross-modal attention; focal loss;
D O I
10.1145/3543873.3587649
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Olfactory experience has great advantages in awakening human memories and emotions, which may even surpass vision in some cases. Studies have proved that olfactory scene descriptions in images and text content can also arouse human olfactory imagination, but there are still few studies on solving related problems from the perspective of computer vision and NLP. This paper proposes a multimodal model that can detect similar olfactory experience in paired text-image samples. The model builds two stages, coarse-grained and fine-grained. The model adopts the feature fusion method based on pre-trained CLIP for coarse-grained matching training to obtain a preliminary feature extractor to promote fine-grained matching training, and then uses the similarity calculation method based on stacked cross attention for fine-grained matching training to obtain the final feature extractor which in turn promotes coarse-grained matching training. Finally, we manually build an approximate olfactory nouns list during fine-grained matching training, which not only yields significantly better performance when fed back to the fine-grained matching process, but this noun list can be used for future research. Experiments on the MUSTI task dataset of MediaEval2022 prove that the coarse-grained and fine-grained matching stages in proposed model both perform well, and both F1 measures exceed the existing baseline models.
引用
收藏
页码:669 / 677
页数:9
相关论文
共 50 条
  • [21] Enhancing Cheapfake Detection: An Approach Using Prompt Engineering and Interleaved Text-Image Model
    Vu, Dang
    Nguyen, Minh-Nhat
    Nguyen, Quoc-Trung
    PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 1306 - 1311
  • [22] GA-SRN: graph attention based text-image semantic reasoning network for fine-grained image classification and retrieval
    Li, Wenhao
    Zhu, Hongqing
    Yang, Suyi
    Wang, Pengyu
    Zhang, Han
    Neural Computing and Applications, 2022, 34 (23) : 21387 - 21401
  • [23] GA-SRN: graph attention based text-image semantic reasoning network for fine-grained image classification and retrieval
    Li, Wenhao
    Zhu, Hongqing
    Yang, Suyi
    Wang, Pengyu
    Zhang, Han
    NEURAL COMPUTING & APPLICATIONS, 2022, 34 (23): : 21387 - 21401
  • [24] RpBERT: A Text-image Relation Propagation-based BERT Model for Multimodal NER
    Sun, Lin
    Wang, Jiquan
    Zhang, Kai
    Su, Yindu
    Weng, Fangsheng
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 13860 - 13868
  • [25] Leaner and Faster: Two-Stage Model Compression for Lightweight Text-Image Retrieval
    Ren, Siyu
    Zhu, Kenny Q.
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 4085 - 4090
  • [26] A Jointly Guided Deep Network for Fine-Grained Cross-Modal Remote Sensing Text-Image Retrieval
    Yang, Lei
    Feng, Yong
    Zhou, Mingling
    Xiong, Xiancai
    Wang, Yongheng
    Qiang, Baohua
    JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2023, 32 (13)
  • [27] Image search using multiresolution matching with a mutual information model
    Wang, H
    Zhang, L
    Wu, GW
    2000 5TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS I-III, 2000, : 951 - 954
  • [28] Dual Semantic Relationship Attention Network for Image-Text Matching
    Wen, Keyu
    Gu, Xiaodong
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [29] Diff-TST: Diffusion model for one-shot text-image style transfer
    Pang, Sizhe
    Chen, Xinyuan
    Xie, Yangchen
    Zhan, Hongjian
    Yin, Bing
    Lu, Yue
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 263
  • [30] TimNet: A text-image matching network integrating multi-stage feature extraction with multi-scale metrics
    Zheng, Xiaoqi
    Tao, Yingfan
    Zhang, Ruikai
    Yang, Wenming
    Liao, Qingmin
    NEUROCOMPUTING, 2021, 465 : 540 - 548