Dual-Modal Transformer with Enhanced Inter- and Intra-Modality Interactions for Image Captioning

被引：7

作者：

Kumar, Deepika ^{[1
]}

Srivastava, Varun ^{[2
]}

Popescu, Daniela Elena ^{[3
]}

Hemanth, Jude D. ^{[4
]}

机构：

[1] Bharati Vidyapeeths Coll Engn, Dept Comp Sci & Engn, New Delhi 110063, India

[2] Thapar Inst Engn & Technol, Dept Comp Sci & Engn, Patiala 147004, Punjab, India

[3] Univ Oradea, Fac Elect Engn & Informat Technol, Oradea 410087, Romania

[4] Karunya Inst Technol & Sci, Dept Elect & Commun Engn, Coimbatore 641114, Tamil Nadu, India

来源：

APPLIED SCIENCES-BASEL | 2022年 / 12卷 / 13期

关键词：

attention model; encoder-decoder model; multi-modal transformer; BLEU score; beam search;

D O I：

10.3390/app12136733

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Image captioning is oriented towards describing an image with the best possible use of words that can provide a semantic, relatable meaning of the scenario inscribed. Different models can be used to accomplish this arduous task depending on the context and requirement of what needs to be achieved. An encoder-decoder model which uses the image feature vectors as an input to the encoder is often marked as one of the appropriate models to accomplish the captioning process. In the proposed work, a dual-modal transformer has been used which captures the intra- and inter-model interactions in a simultaneous manner within an attention block. The transformer architecture is quantitatively evaluated on a publicly available Microsoft Common Objects in Context (MS COCO) dataset yielding a Bilingual Evaluation Understudy (BLEU)-4 Score of 85.01. The efficacy of the model is evaluated on Flickr 8k, Flickr 30k datasets and MS COCO datasets and results for the same is compared and analysed with the state-of-the-art methods. The results shows that the proposed model outperformed when compared with conventional models, such as the encoder-decoder model and attention model.

引用

页数：20

共 25 条

[1] Multimodal Affective Computing With Dense Fusion Transformer for Inter- and Intra-Modality Interactions
Deng, Huan
Yang, Zhenguo
Hao, Tianyong
Li, Qing
Liu, Wenyin
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 6575 - 6587
[2] Visual Question Answering With Dense Inter- and Intra-Modality Interactions
Liu, Fei
Liu, Jing
Fang, Zhiwei
Hong, Richang
Lu, Hanqing
IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 3518 - 3529
[3] Illumination-Guided RGBT Object Detection With Inter- and Intra-Modality Fusion
Zhang, Yan
Yu, Huai
He, Yujie
Wang, Xinya
Yang, Wen
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2023, 72
[4] Dual Global Enhanced Transformer for image captioning
Xian, Tiantao
Li, Zhixin
Zhang, Canlong
Ma, Huifang
NEURAL NETWORKS, 2022, 148 : 129 - 141
[5] Deep Learning Based Inter-modality Image Registration Supervised by Intra-modality Similarity
Cao, Xiaohuan
Yang, Jianhuan
Wang, Li
Xue, Zhong
Wang, Qian
Shen, Dinggang
MACHINE LEARNING IN MEDICAL IMAGING: 9TH INTERNATIONAL WORKSHOP, MLMI 2018, 2018, 11046 : 55 - 63
[6] StairwayGraphNet for Inter- and Intra-modality Multi-resolution Brain Graph Alignment and Synthesis
Mhiri, Islem
Mahjoub, Mohamed Ali
Rekik, Islem
MACHINE LEARNING IN MEDICAL IMAGING, MLMI 2021, 2021, 12966 : 140 - 150
[7] CurrI2P: inter- and intra-modality similarity curriculum learning for image-to-point cloud registration
Lin, Liwei
Lin, Chunyu
Nie, Lang
Huang, Shujuan
Zhao, Yao
VISUAL COMPUTER, 2025,
[8] FIRE: Unsupervised bi-directional inter- and intra-modality registration using deep networks
Wang, Chengjia
Yang, Guang
Papanastasiou, Giorgos
2021 IEEE 34TH INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS (CBMS), 2021, : 510 - 514
[9] Audio-Oriented Multimodal Machine Comprehension via Dynamic Inter- and Intra-modality Attention
Huang, Zhiqi
Liu, Fenglin
Wu, Xian
Ge, Shen
Wang, Helin
Fan, Wei
Zon, Yuexian
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 13098 - 13106
[10] Inter- and intra-modality reliability of magnetoencephalographic somatosensory localization utilizing pneumatic digit and median nerve stimulation
Carlson, C
Stout, J
Schevon, C
Kuzniecky, R
Devinsky, O
Pacia, S
NEUROLOGY, 2006, 66 (05) : A180 - A180

← 1 2 3 →