Dual-grained Text-Image Olfactory Matching Model with Mutual Promotion Stages

被引：1

作者：

Shao, Yi ^{[1
]}

Sun, Jiande ^{[1
]}

Jiang, Ye ^{[1
]}

Li, Jing ^{[2
]}

机构：

[1] Shandong Normal Univ, Sch Informat Sci & Engn, Jinan, Shandong, Peoples R China

[2] Shandong Normal Univ, Sch Journalism & Commun, Jinan, Shandong, Peoples R China

来源：

COMPANION OF THE WORLD WIDE WEB CONFERENCE, WWW 2023 | 2023年

关键词：

multimodal; text-image matching; olfactory representation; coarse-grained; fne-grained; cross-modal attention; focal loss;

D O I：

10.1145/3543873.3587649

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Olfactory experience has great advantages in awakening human memories and emotions, which may even surpass vision in some cases. Studies have proved that olfactory scene descriptions in images and text content can also arouse human olfactory imagination, but there are still few studies on solving related problems from the perspective of computer vision and NLP. This paper proposes a multimodal model that can detect similar olfactory experience in paired text-image samples. The model builds two stages, coarse-grained and fine-grained. The model adopts the feature fusion method based on pre-trained CLIP for coarse-grained matching training to obtain a preliminary feature extractor to promote fine-grained matching training, and then uses the similarity calculation method based on stacked cross attention for fine-grained matching training to obtain the final feature extractor which in turn promotes coarse-grained matching training. Finally, we manually build an approximate olfactory nouns list during fine-grained matching training, which not only yields significantly better performance when fed back to the fine-grained matching process, but this noun list can be used for future research. Experiments on the MUSTI task dataset of MediaEval2022 prove that the coarse-grained and fine-grained matching stages in proposed model both perform well, and both F1 measures exceed the existing baseline models.

引用

页码：669 / 677

页数：9

共 50 条

[41] ALADIN: Distilling Fine-grained Alignment Scores for Efficient Image-Text Matching and Retrieval
Messina, Nicola
Stefanini, Matteo
Cornia, Marcella
Baraldi, Lorenzo
Falchi, Fabrizio
Amato, Giuseppe
Cucchiara, Rita
19TH INTERNATIONAL CONFERENCE ON CONTENT-BASED MULTIMEDIA INDEXING, CBMI 2022, 2022, : 64 - 70
[42] Multi-level network based on transformer encoder for fine-grained image–text matching
Lei Yang
Yong Feng
Mingliang Zhou
Xiancai Xiong
Yongheng Wang
Baohua Qiang
Multimedia Systems, 2023, 29 : 1981 - 1994
[43] Location Attention Knowledge Embedding Model for Image-Text Matching
Xu, Guoqing
Hu, Min
Wang, Xiaohua
Yang, Jiaoyun
Li, Nan
Zhang, Qingyu
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT I, 2024, 14425 : 408 - 421
[44] Image-Text Matching Model Based on CLIP Bimodal Encoding
Zhu, Yihuan
Xu, Honghua
Du, Ailin
Wang, Bin
APPLIED SCIENCES-BASEL, 2024, 14 (22):
[45] Text image matching without language model using a Hausdorff distance
Son, Hwa-Jeong
Kim, Soo-Hyung
Kim, Ji-Soo
INFORMATION PROCESSING & MANAGEMENT, 2008, 44 (03) : 1189 - 1200
[46] Image-Text Dual Model for Small-Sample Image Classification
Zhu, Fangyi
Li, Xiaoxu
Ma, Zhanyu
Chen, Guang
Peng, Pai
Guo, Xiaowei
Chien, Jen-Tzung
Guo, Jun
COMPUTER VISION, PT II, 2017, 772 : 556 - 565
[47] Dual Relation-Aware Synergistic Attention Network for Image-Text Matching
Qi, Shanshan
Yang, Luxi
Li, Chunguo
Huang, Yongming
2022 11TH INTERNATIONAL CONFERENCE ON COMMUNICATIONS, CIRCUITS AND SYSTEMS (ICCCAS 2022), 2022, : 251 - 256
[48] Multi-level network based on transformer encoder for fine-grained image-text matching
Yang, Lei
Feng, Yong
Zhou, Mingliang
Xiong, Xiancai
Wang, Yongheng
Qiang, Baohua
MULTIMEDIA SYSTEMS, 2023, 29 (04) : 1981 - 1994
[49] VSR plus plus : Improving Visual Semantic Reasoning for Fine-Grained Image-Text Matching
Yuan, Hui
Huang, Yan
Zhang, Dongbo
Chen, Zerui
Cheng, Wenlong
Wang, Liang
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 3728 - 3735
[50] Fine-grained text and image guided point cloud completion with CLIP model
Zhou, Jun
Song, Wei
Wang, Mingjie
Tan, Hongchen
Li, Nannan
Liu, Xiuping
NEUROCOMPUTING, 2025, 631

← 1 2 3 4 5 →