Contextualized Keyword Representations for Multi-modal Retinal Image Captioning

被引：11

作者：

Huang, Jia-Hong ^{[1
]}

Wu, Ting-Wei ^{[2
]}

Worring, Marcel ^{[1
]}

机构：

[1] Univ Amsterdam, Amsterdam, Netherlands

[2] Georgia Inst Technol, Atlanta, GA 30332 USA

来源：

PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21) | 2021年

关键词：

OPTIC-NERVE; CLASSIFICATION;

D O I：

10.1145/3460426.3463667

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Medical image captioning automatically generates a medical description to describe the content of a given medical image. Traditional medical image captioning models create a medical description based on a single medical image input only. Hence, an abstract medical description or concept is hard to be generated based on the traditional approach. Such a method limits the effectiveness of medical image captioning. Multi-modal medical image captioning is one of the approaches utilized to address this problem. In multi-modal medical image captioning, textual input, e.g., expert-defined keywords, is considered as one of the main drivers of medical description generation. Thus, encoding the textual input and the medical image effectively are both important for the task of multi-modal medical image captioning. In this work, a new end-to-end deep multi-modal medical image captioning model is proposed. Contextualized keyword representations, textual feature reinforcement, and masked self-attention are used to develop the proposed approach. Based on the evaluation of an existing multi-modal medical image captioning dataset, experimental results show that the proposed model is effective with an increase of +53.2% in BLEU-avg and +18.6% in CIDEr, compared with the state-of-the-art method.

引用

页码：645 / 652

页数：8

共 50 条

[1] Multi-Modal Image Captioning for the Visually Impaired
Ahsan, Hiba
Bhalla, Nikita
Bhatt, Daivat
Shah, Kaivankumar
2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 53 - 60
[2] Multi-Modal Graph Aggregation Transformer for image captioning
Chen, Lizhi
Li, Kesen
NEURAL NETWORKS, 2025, 181
[3] Multi-modal Dense Video Captioning
Iashin, Vladimir
Rahtu, Esa
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, : 4117 - 4126
[4] Multi-modal Dependency Tree for Video Captioning
Zhao, Wentian
Wu, Xinxiao
Luo, Jiebo
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[5] Boosting Entity-Aware Image Captioning With Multi-Modal Knowledge Graph
Zhao, Wentian
Wu, Xinxiao
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 2659 - 2670
[6] Fine-tuning with Multi-modal Entity Prompts for News Image Captioning
Zhang, Jingjing
Fang, Shancheng
Mao, Zhendong
Zhang, Zhiwei
Zhang, Yongdong
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4365 - 4373
[7] Image Captioning with Clause-Focused Metrics in a Multi-Modal Setting for Marketing
Harzig, Philipp
Zecha, Dan
Lienhart, Rainer
Kaiser, Carolin
Schallner, Rene
2019 2ND IEEE CONFERENCE ON MULTIMEDIA INFORMATION PROCESSING AND RETRIEVAL (MIPR 2019), 2019, : 419 - 424
[8] Learning Cross-modal Representations with Multi-relations for Image Captioning
Cheng, Peng
Le, Tung
Racharak, Teeradaj
Cao Yiming
Kong Weikun
Minh Le Nguyen
PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION APPLICATIONS AND METHODS (ICPRAM), 2021, : 346 - 353
[9] LEARNING OPTIMAL SHAPE REPRESENTATIONS FOR MULTI-MODAL IMAGE REGISTRATION
Grossiord, Eloise
Risser, Laurent
Kanoun, Salim
Ken, Soleakhena
Malgouyres, Francois
2020 IEEE 17TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI 2020), 2020, : 722 - 725
[10] GLOVE-ING ATTENTION: A MULTI-MODAL NEURAL LEARNING APPROACH TO IMAGE CAPTIONING
Anundskas, Lars Halvor
Afridi, Hina
Tarekegn, Adane Nega
Yamin, Muhammad Mudassar
Ullah, Mohib
Yamin, Saira
Cheikh, Faouzi Alaya
2023 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW, 2023,

← 1 2 3 4 5 →