Chest radiology report generation based on cross-modal multi-scale feature fusion

被引:4
|
作者
Pan, Yu [1 ]
Liu, Li -Jun [1 ,2 ,3 ]
Yang, Xiao-Bing [1 ]
Peng, Wei [1 ]
Huang, Qing-Song [1 ]
机构
[1] Kunming Univ Sci & Technol, Sch Informat Engn & Automat, Kunming, Peoples R China
[2] Yunnan Key Lab Comp Technol Applicat, Kunming, Peoples R China
[3] Kunming Univ Sci & Technol, Sch Informat Engn & Automat, Wujiaying St, Kunming, Yunnan, Peoples R China
基金
中国国家自然科学基金;
关键词
Report generation; Cross; -modal; Multi; -scale; Medical image; Attention mechanism; Deep learning;
D O I
10.1016/j.jrras.2024.100823
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Chest radiology imaging plays a crucial role in the early screening, diagnosis, and treatment of chest diseases. The accurate interpretation of radiological images and the automatic generation of radiology reports not only save the doctor's time but also mitigate the risk of errors in diagnosis. The core objective of automatic radiology report generation is to achieve precise mapping of visual features and lesion descriptions at multi-scale and finegrained levels. Existing methods typically combine global visual features and textual features to generate radiology reports. However, these approaches may ignore the key lesion areas and lack sensitivity to crucial lesion location information. Furthermore, achieving multi-scale characterization and fine-grained alignment of medical visual features and report text features proves challenging, leading to a reduction in the quality of radiology report generation. Addressing these issues, we propose a method for chest radiology report generation based on cross-modal multi-scale feature fusion. First, an auxiliary labeling module is designed to guide the model to focus on the lesion region of the radiological image. Second, a channel attention network is employed to enhance the characterization of location information and disease features. Finally, a cross-modal features fusion module is constructed by combining memory matrices, facilitating fine-grained alignment between multi-scale visual features and reporting text features on corresponding scales. The proposed method is experimentally evaluated on two publicly available radiological image datasets. The results demonstrate superior performance based on BLEU and ROUGE metrics compared to existing methods. Particularly, there are improvements of 4.8% in the ROUGE metric and 9.4% in the METEOR metric on the IU X-Ray dataset. Moreover, there is a 7.4% enhancement in BLEU-1 and a 7.6% improvement in the BLEU-2 on the MIMIC-CXR dataset.
引用
收藏
页数:9
相关论文
共 50 条
  • [21] Cross-modal Deep Learning-based Clinical Recommendation System for Radiology Report Generation from Chest X-rays
    Shetty S.
    Ananthanarayana V.S.
    Mahale A.
    International Journal of Engineering, Transactions B: Applications, 2023, 36 (08): : 1569 - 1577
  • [22] Cross-modal Deep Learning-based Clinical Recommendation System for Radiology Report Generation from Chest X-rays
    Shetty, S.
    Ananthanarayana, V. S.
    Mahale, A.
    INTERNATIONAL JOURNAL OF ENGINEERING, 2023, 36 (08): : 1569 - 1577
  • [23] Visual-Textual Cross-Modal Interaction Network for Radiology Report Generation
    Zhang, Wenfeng
    Cai, Baoning
    Hu, Jianming
    Qin, Qibing
    Xie, Kezhen
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 984 - 988
  • [24] Eye Gaze Guided Cross-Modal Alignment Network for Radiology Report Generation
    Peng, Peixi
    Fan, Wanshu
    Shen, Yue
    Liu, Wenfei
    Yang, Xin
    Zhang, Qiang
    Wei, Xiaopeng
    Zhou, Dongsheng
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2024, 28 (12) : 7406 - 7419
  • [25] Knowledge-Guided Cross-Modal Alignment and Progressive Fusion for Chest X-Ray Report Generation
    Huang, Lili
    Cao, Yiming
    Jia, Pengcheng
    Li, Chenglong
    Tang, Jin
    Li, Chuanfu
    IEEE TRANSACTIONS ON MULTIMEDIA, 2025, 27 : 557 - 567
  • [26] Multi-grained Cross-Modal Feature Fusion Network for Diagnosis Prediction
    An, Ying
    Zhao, Zhenrui
    Chen, Xianlai
    BIOINFORMATICS RESEARCH AND APPLICATIONS, PT II, ISBRA 2024, 2024, 14955 : 221 - 232
  • [27] Estimation of Pig Weight Based on Cross-modal Feature Fusion Model
    He W.
    Mi Y.
    Liu G.
    Ding X.
    Li T.
    Nongye Jixie Xuebao/Transactions of the Chinese Society for Agricultural Machinery, 2023, 54 : 275 - 282and329
  • [28] Multi-modal fusion network with multi-scale multi-path and cross-modal interactions for RGB-D salient object detection
    Chen, Hao
    Li, Youfu
    Su, Dan
    PATTERN RECOGNITION, 2019, 86 : 376 - 385
  • [29] Joint feature fusion hashing for cross-modal retrieval
    Cao, Yuxia
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024, 15 (12) : 6149 - 6162
  • [30] PointCMC: cross-modal multi-scale correspondences learning for point cloud understanding
    Zhou, Honggu
    Peng, Xiaogang
    Luo, Yikai
    Wu, Zizhao
    MULTIMEDIA SYSTEMS, 2024, 30 (03)