Dual-grained Text-Image Olfactory Matching Model with Mutual Promotion Stages

被引：1

作者：

Shao, Yi ^{[1
]}

Sun, Jiande ^{[1
]}

Jiang, Ye ^{[1
]}

Li, Jing ^{[2
]}

机构：

[1] Shandong Normal Univ, Sch Informat Sci & Engn, Jinan, Shandong, Peoples R China

[2] Shandong Normal Univ, Sch Journalism & Commun, Jinan, Shandong, Peoples R China

来源：

COMPANION OF THE WORLD WIDE WEB CONFERENCE, WWW 2023 | 2023年

关键词：

multimodal; text-image matching; olfactory representation; coarse-grained; fne-grained; cross-modal attention; focal loss;

D O I：

10.1145/3543873.3587649

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Olfactory experience has great advantages in awakening human memories and emotions, which may even surpass vision in some cases. Studies have proved that olfactory scene descriptions in images and text content can also arouse human olfactory imagination, but there are still few studies on solving related problems from the perspective of computer vision and NLP. This paper proposes a multimodal model that can detect similar olfactory experience in paired text-image samples. The model builds two stages, coarse-grained and fine-grained. The model adopts the feature fusion method based on pre-trained CLIP for coarse-grained matching training to obtain a preliminary feature extractor to promote fine-grained matching training, and then uses the similarity calculation method based on stacked cross attention for fine-grained matching training to obtain the final feature extractor which in turn promotes coarse-grained matching training. Finally, we manually build an approximate olfactory nouns list during fine-grained matching training, which not only yields significantly better performance when fed back to the fine-grained matching process, but this noun list can be used for future research. Experiments on the MUSTI task dataset of MediaEval2022 prove that the coarse-grained and fine-grained matching stages in proposed model both perform well, and both F1 measures exceed the existing baseline models.

引用

页码：669 / 677

页数：9

共 50 条

[1] Text-image matching for multi-model machine translation
Xiayang Shi
Zhenqiang Yu
Xuhui Wang
Yijun Li
Yufeng Niu
The Journal of Supercomputing, 2023, 79 : 17810 - 17823
[2] Text-image matching for multi-model machine translation
Shi, Xiayang
Yu, Zhenqiang
Wang, Xuhui
Li, Yijun
Niu, Yufeng
JOURNAL OF SUPERCOMPUTING, 2023, 79 (16): : 17810 - 17823
[3] A Strong and Robust Baseline for Text-Image Matching
Liu, Fangyu
Ye, Rongtian
57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019:): STUDENT RESEARCH WORKSHOP, 2019, : 169 - 176
[4] DADAN: dual-path attention with distribution analysis network for text-image matching
Wenhao Li
Hongqing Zhu
Suyi Yang
Han Zhang
Signal, Image and Video Processing, 2022, 16 : 797 - 805
[5] Image Annotation as Text-Image Matching: Challenge Design and Results
Pellegrin, Luis
Loyola-Gonzalez, Octavio
Ortiz-Bejar, Jose
Angel Medina-Perez, Miguel
Eduardo Gutierrez-Rodriguez, Andres
Tellez, Eric S.
Graff, Mario
Miranda-Jimenez, Sabino
Moctezuma, Daniela
Garcia-Limon, Mauricio
Morales-Reyes, Alicia
Reyes-Garcia, Carlos A.
Morales, Eduardo
Jair Escalante, Hugo
COMPUTACION Y SISTEMAS, 2019, 23 (04): : 1305 - 1321
[6] DADAN: dual-path attention with distribution analysis network for text-image matching
Li, Wenhao
Zhu, Hongqing
Yang, Suyi
Zhang, Han
SIGNAL IMAGE AND VIDEO PROCESSING, 2022, 16 (03) : 797 - 805
[7] Overview of the 2017 RedICA Text-Image Matching (RICATIM) Challenge
Pellegrin, Luis
Jair Escalante, Hugo
Morales, Alicia
Morales, Eduardo F.
Reyes-Garcia, Carlos A.
2017 IEEE INTERNATIONAL AUTUMN MEETING ON POWER, ELECTRONICS AND COMPUTING (ROPEC), 2017,
[8] Leverage Boosting and Transformer on Text-Image Matching for Cheap Fakes Detection
Tuan-Vinh La
Dao, Minh-Son
Le, Duy-Dong
Thai, Kim-Phung
Nguyen, Quoc-Hung
Phan-Thi, Thuy-Kieu
ALGORITHMS, 2022, 15 (11)
[9] HAL: Improved Text-Image Matching by Mitigating Visual Semantic Hubs
Liu, Fangyu
Ye, Rongtian
Wang, Xun
Li, Shuaipeng
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 11563 - 11571
[10] An Auxiliary Modality Based Text-Image Matching Methodology for Fake News Detection
Guo, Ying
Li, Bingxin
Ge, Hong
Di, Chong
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT II, 2023, 14255 : 65 - 76

← 1 2 3 4 5 →