Dual-grained Text-Image Olfactory Matching Model with Mutual Promotion Stages

被引:1
|
作者
Shao, Yi [1 ]
Sun, Jiande [1 ]
Jiang, Ye [1 ]
Li, Jing [2 ]
机构
[1] Shandong Normal Univ, Sch Informat Sci & Engn, Jinan, Shandong, Peoples R China
[2] Shandong Normal Univ, Sch Journalism & Commun, Jinan, Shandong, Peoples R China
关键词
multimodal; text-image matching; olfactory representation; coarse-grained; fne-grained; cross-modal attention; focal loss;
D O I
10.1145/3543873.3587649
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Olfactory experience has great advantages in awakening human memories and emotions, which may even surpass vision in some cases. Studies have proved that olfactory scene descriptions in images and text content can also arouse human olfactory imagination, but there are still few studies on solving related problems from the perspective of computer vision and NLP. This paper proposes a multimodal model that can detect similar olfactory experience in paired text-image samples. The model builds two stages, coarse-grained and fine-grained. The model adopts the feature fusion method based on pre-trained CLIP for coarse-grained matching training to obtain a preliminary feature extractor to promote fine-grained matching training, and then uses the similarity calculation method based on stacked cross attention for fine-grained matching training to obtain the final feature extractor which in turn promotes coarse-grained matching training. Finally, we manually build an approximate olfactory nouns list during fine-grained matching training, which not only yields significantly better performance when fed back to the fine-grained matching process, but this noun list can be used for future research. Experiments on the MUSTI task dataset of MediaEval2022 prove that the coarse-grained and fine-grained matching stages in proposed model both perform well, and both F1 measures exceed the existing baseline models.
引用
收藏
页码:669 / 677
页数:9
相关论文
共 50 条
  • [1] Text-image matching for multi-model machine translation
    Xiayang Shi
    Zhenqiang Yu
    Xuhui Wang
    Yijun Li
    Yufeng Niu
    The Journal of Supercomputing, 2023, 79 : 17810 - 17823
  • [2] Text-image matching for multi-model machine translation
    Shi, Xiayang
    Yu, Zhenqiang
    Wang, Xuhui
    Li, Yijun
    Niu, Yufeng
    JOURNAL OF SUPERCOMPUTING, 2023, 79 (16): : 17810 - 17823
  • [3] A Strong and Robust Baseline for Text-Image Matching
    Liu, Fangyu
    Ye, Rongtian
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019:): STUDENT RESEARCH WORKSHOP, 2019, : 169 - 176
  • [4] DADAN: dual-path attention with distribution analysis network for text-image matching
    Wenhao Li
    Hongqing Zhu
    Suyi Yang
    Han Zhang
    Signal, Image and Video Processing, 2022, 16 : 797 - 805
  • [5] Image Annotation as Text-Image Matching: Challenge Design and Results
    Pellegrin, Luis
    Loyola-Gonzalez, Octavio
    Ortiz-Bejar, Jose
    Angel Medina-Perez, Miguel
    Eduardo Gutierrez-Rodriguez, Andres
    Tellez, Eric S.
    Graff, Mario
    Miranda-Jimenez, Sabino
    Moctezuma, Daniela
    Garcia-Limon, Mauricio
    Morales-Reyes, Alicia
    Reyes-Garcia, Carlos A.
    Morales, Eduardo
    Jair Escalante, Hugo
    COMPUTACION Y SISTEMAS, 2019, 23 (04): : 1305 - 1321
  • [6] DADAN: dual-path attention with distribution analysis network for text-image matching
    Li, Wenhao
    Zhu, Hongqing
    Yang, Suyi
    Zhang, Han
    SIGNAL IMAGE AND VIDEO PROCESSING, 2022, 16 (03) : 797 - 805
  • [7] Overview of the 2017 RedICA Text-Image Matching (RICATIM) Challenge
    Pellegrin, Luis
    Jair Escalante, Hugo
    Morales, Alicia
    Morales, Eduardo F.
    Reyes-Garcia, Carlos A.
    2017 IEEE INTERNATIONAL AUTUMN MEETING ON POWER, ELECTRONICS AND COMPUTING (ROPEC), 2017,
  • [8] Leverage Boosting and Transformer on Text-Image Matching for Cheap Fakes Detection
    Tuan-Vinh La
    Dao, Minh-Son
    Le, Duy-Dong
    Thai, Kim-Phung
    Nguyen, Quoc-Hung
    Phan-Thi, Thuy-Kieu
    ALGORITHMS, 2022, 15 (11)
  • [9] HAL: Improved Text-Image Matching by Mitigating Visual Semantic Hubs
    Liu, Fangyu
    Ye, Rongtian
    Wang, Xun
    Li, Shuaipeng
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 11563 - 11571
  • [10] An Auxiliary Modality Based Text-Image Matching Methodology for Fake News Detection
    Guo, Ying
    Li, Bingxin
    Ge, Hong
    Di, Chong
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT II, 2023, 14255 : 65 - 76