Dual-grained Text-Image Olfactory Matching Model with Mutual Promotion Stages

被引：1

作者：

Shao, Yi ^{[1
]}

Sun, Jiande ^{[1
]}

Jiang, Ye ^{[1
]}

Li, Jing ^{[2
]}

机构：

[1] Shandong Normal Univ, Sch Informat Sci & Engn, Jinan, Shandong, Peoples R China

[2] Shandong Normal Univ, Sch Journalism & Commun, Jinan, Shandong, Peoples R China

来源：

COMPANION OF THE WORLD WIDE WEB CONFERENCE, WWW 2023 | 2023年

关键词：

multimodal; text-image matching; olfactory representation; coarse-grained; fne-grained; cross-modal attention; focal loss;

D O I：

10.1145/3543873.3587649

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Olfactory experience has great advantages in awakening human memories and emotions, which may even surpass vision in some cases. Studies have proved that olfactory scene descriptions in images and text content can also arouse human olfactory imagination, but there are still few studies on solving related problems from the perspective of computer vision and NLP. This paper proposes a multimodal model that can detect similar olfactory experience in paired text-image samples. The model builds two stages, coarse-grained and fine-grained. The model adopts the feature fusion method based on pre-trained CLIP for coarse-grained matching training to obtain a preliminary feature extractor to promote fine-grained matching training, and then uses the similarity calculation method based on stacked cross attention for fine-grained matching training to obtain the final feature extractor which in turn promotes coarse-grained matching training. Finally, we manually build an approximate olfactory nouns list during fine-grained matching training, which not only yields significantly better performance when fed back to the fine-grained matching process, but this noun list can be used for future research. Experiments on the MUSTI task dataset of MediaEval2022 prove that the coarse-grained and fine-grained matching stages in proposed model both perform well, and both F1 measures exceed the existing baseline models.

引用

页码：669 / 677

页数：9

共 50 条

[21] Enhancing Cheapfake Detection: An Approach Using Prompt Engineering and Interleaved Text-Image Model
Vu, Dang
Nguyen, Minh-Nhat
Nguyen, Quoc-Trung
PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 1306 - 1311
[22] GA-SRN: graph attention based text-image semantic reasoning network for fine-grained image classification and retrieval
Li, Wenhao
Zhu, Hongqing
Yang, Suyi
Wang, Pengyu
Zhang, Han
Neural Computing and Applications, 2022, 34 (23) : 21387 - 21401
[23] GA-SRN: graph attention based text-image semantic reasoning network for fine-grained image classification and retrieval
Li, Wenhao
Zhu, Hongqing
Yang, Suyi
Wang, Pengyu
Zhang, Han
NEURAL COMPUTING & APPLICATIONS, 2022, 34 (23): : 21387 - 21401
[24] RpBERT: A Text-image Relation Propagation-based BERT Model for Multimodal NER
Sun, Lin
Wang, Jiquan
Zhang, Kai
Su, Yindu
Weng, Fangsheng
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 13860 - 13868
[25] Leaner and Faster: Two-Stage Model Compression for Lightweight Text-Image Retrieval
Ren, Siyu
Zhu, Kenny Q.
NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 4085 - 4090
[26] A Jointly Guided Deep Network for Fine-Grained Cross-Modal Remote Sensing Text-Image Retrieval
Yang, Lei
Feng, Yong
Zhou, Mingling
Xiong, Xiancai
Wang, Yongheng
Qiang, Baohua
JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2023, 32 (13)
[27] Image search using multiresolution matching with a mutual information model
Wang, H
Zhang, L
Wu, GW
2000 5TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS I-III, 2000, : 951 - 954
[28] Dual Semantic Relationship Attention Network for Image-Text Matching
Wen, Keyu
Gu, Xiaodong
2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
[29] Diff-TST: Diffusion model for one-shot text-image style transfer
Pang, Sizhe
Chen, Xinyuan
Xie, Yangchen
Zhan, Hongjian
Yin, Bing
Lu, Yue
EXPERT SYSTEMS WITH APPLICATIONS, 2025, 263
[30] TimNet: A text-image matching network integrating multi-stage feature extraction with multi-scale metrics
Zheng, Xiaoqi
Tao, Yingfan
Zhang, Ruikai
Yang, Wenming
Liao, Qingmin
NEUROCOMPUTING, 2021, 465 : 540 - 548

← 1 2 3 4 5 →