Chest radiology report generation based on cross-modal multi-scale feature fusion

被引:4
|
作者
Pan, Yu [1 ]
Liu, Li -Jun [1 ,2 ,3 ]
Yang, Xiao-Bing [1 ]
Peng, Wei [1 ]
Huang, Qing-Song [1 ]
机构
[1] Kunming Univ Sci & Technol, Sch Informat Engn & Automat, Kunming, Peoples R China
[2] Yunnan Key Lab Comp Technol Applicat, Kunming, Peoples R China
[3] Kunming Univ Sci & Technol, Sch Informat Engn & Automat, Wujiaying St, Kunming, Yunnan, Peoples R China
基金
中国国家自然科学基金;
关键词
Report generation; Cross; -modal; Multi; -scale; Medical image; Attention mechanism; Deep learning;
D O I
10.1016/j.jrras.2024.100823
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Chest radiology imaging plays a crucial role in the early screening, diagnosis, and treatment of chest diseases. The accurate interpretation of radiological images and the automatic generation of radiology reports not only save the doctor's time but also mitigate the risk of errors in diagnosis. The core objective of automatic radiology report generation is to achieve precise mapping of visual features and lesion descriptions at multi-scale and finegrained levels. Existing methods typically combine global visual features and textual features to generate radiology reports. However, these approaches may ignore the key lesion areas and lack sensitivity to crucial lesion location information. Furthermore, achieving multi-scale characterization and fine-grained alignment of medical visual features and report text features proves challenging, leading to a reduction in the quality of radiology report generation. Addressing these issues, we propose a method for chest radiology report generation based on cross-modal multi-scale feature fusion. First, an auxiliary labeling module is designed to guide the model to focus on the lesion region of the radiological image. Second, a channel attention network is employed to enhance the characterization of location information and disease features. Finally, a cross-modal features fusion module is constructed by combining memory matrices, facilitating fine-grained alignment between multi-scale visual features and reporting text features on corresponding scales. The proposed method is experimentally evaluated on two publicly available radiological image datasets. The results demonstrate superior performance based on BLEU and ROUGE metrics compared to existing methods. Particularly, there are improvements of 4.8% in the ROUGE metric and 9.4% in the METEOR metric on the IU X-Ray dataset. Moreover, there is a 7.4% enhancement in BLEU-1 and a 7.6% improvement in the BLEU-2 on the MIMIC-CXR dataset.
引用
收藏
页数:9
相关论文
共 50 条
  • [31] Sequential Cross-Modal Hashing Learning via Multi-scale Correlation Mining
    Ye, Zhaoda
    Peng, Yuxin
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2019, 15 (04)
  • [32] Kinship verification based on multi-scale feature fusion
    Yan C.
    Liu Y.
    Multimedia Tools and Applications, 2024, 83 (40) : 88069 - 88090
  • [33] Drone Detection Based on Multi-scale Feature Fusion
    Zeng, Zhenni
    Wang, Zhenning
    Qin, Lang
    Li, Hui
    2021 6TH INTERNATIONAL CONFERENCE ON UK-CHINA EMERGING TECHNOLOGIES (UCET 2021), 2021, : 194 - 198
  • [34] MSCM-Net: Rail Surface Defect Detection Based on a Multi-Scale Cross-Modal Network
    Wen, Xin
    Zheng, Xiao
    He, Yu
    CMC-COMPUTERS MATERIALS & CONTINUA, 2025, 82 (03): : 4371 - 4388
  • [35] Robust indoor localization based on multi-modal information fusion and multi-scale sequential feature extraction
    Wang, Qinghu
    Jia, Jie
    Chen, Jian
    Deng, Yansha
    Wang, Xingwei
    Aghvami, Abdol Hamid
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2024, 155 : 164 - 178
  • [36] Vulnerability detection through cross-modal feature enhancement and fusion
    Tao, Wenxin
    Su, Xiaohong
    Wan, Jiayuan
    Wei, Hongwei
    Zheng, Weining
    COMPUTERS & SECURITY, 2023, 132
  • [37] Fake News Detection via Multi-scale Semantic Alignment and Cross-modal Attention
    Wang, Jiandong
    Zhang, Hongguang
    Liu, Chun
    Yang, Xiongjun
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 2406 - 2410
  • [38] Multi-scale Cross-Modal Transformer Network for RGB-D Object Detection
    Xiao, Zhibin
    Xie, Pengwei
    Wang, Guijin
    MULTIMEDIA MODELING (MMM 2022), PT I, 2022, 13141 : 352 - 363
  • [39] Deep Label Feature Fusion Hashing for Cross-Modal Retrieval
    Ren, Dongxiao
    Xu, Weihua
    Wang, Zhonghua
    Sun, Qinxiu
    IEEE ACCESS, 2022, 10 : 100276 - 100285
  • [40] CMFFN: An efficient cross-modal feature fusion network for semantic
    Zhang, Yingjian
    Li, Ning
    Jiao, Jichao
    Ai, Jiawen
    Yan, Zheng
    Zeng, Yingchao
    Zhang, Tianxiang
    Li, Qian
    ROBOTICS AND AUTONOMOUS SYSTEMS, 2025, 186