Chest radiology report generation based on cross-modal multi-scale feature fusion

被引:4
|
作者
Pan, Yu [1 ]
Liu, Li -Jun [1 ,2 ,3 ]
Yang, Xiao-Bing [1 ]
Peng, Wei [1 ]
Huang, Qing-Song [1 ]
机构
[1] Kunming Univ Sci & Technol, Sch Informat Engn & Automat, Kunming, Peoples R China
[2] Yunnan Key Lab Comp Technol Applicat, Kunming, Peoples R China
[3] Kunming Univ Sci & Technol, Sch Informat Engn & Automat, Wujiaying St, Kunming, Yunnan, Peoples R China
基金
中国国家自然科学基金;
关键词
Report generation; Cross; -modal; Multi; -scale; Medical image; Attention mechanism; Deep learning;
D O I
10.1016/j.jrras.2024.100823
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Chest radiology imaging plays a crucial role in the early screening, diagnosis, and treatment of chest diseases. The accurate interpretation of radiological images and the automatic generation of radiology reports not only save the doctor's time but also mitigate the risk of errors in diagnosis. The core objective of automatic radiology report generation is to achieve precise mapping of visual features and lesion descriptions at multi-scale and finegrained levels. Existing methods typically combine global visual features and textual features to generate radiology reports. However, these approaches may ignore the key lesion areas and lack sensitivity to crucial lesion location information. Furthermore, achieving multi-scale characterization and fine-grained alignment of medical visual features and report text features proves challenging, leading to a reduction in the quality of radiology report generation. Addressing these issues, we propose a method for chest radiology report generation based on cross-modal multi-scale feature fusion. First, an auxiliary labeling module is designed to guide the model to focus on the lesion region of the radiological image. Second, a channel attention network is employed to enhance the characterization of location information and disease features. Finally, a cross-modal features fusion module is constructed by combining memory matrices, facilitating fine-grained alignment between multi-scale visual features and reporting text features on corresponding scales. The proposed method is experimentally evaluated on two publicly available radiological image datasets. The results demonstrate superior performance based on BLEU and ROUGE metrics compared to existing methods. Particularly, there are improvements of 4.8% in the ROUGE metric and 9.4% in the METEOR metric on the IU X-Ray dataset. Moreover, there is a 7.4% enhancement in BLEU-1 and a 7.6% improvement in the BLEU-2 on the MIMIC-CXR dataset.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Depth map reconstruction method based on multi-scale cross-modal feature fusion
    Yang J.
    Xie T.
    Yue H.
    Huazhong Keji Daxue Xuebao (Ziran Kexue Ban)/Journal of Huazhong University of Science and Technology (Natural Science Edition), 2023, 51 (03): : 52 - 59
  • [2] RGB-D Saliency Detection based on Cross-Modal and Multi-scale Feature Fusion
    Zhu, Xuxing
    Wu, Jin
    Zhu, Lei
    2022 34TH CHINESE CONTROL AND DECISION CONFERENCE, CCDC, 2022, : 6154 - 6160
  • [3] Radiology Report Generation Method Based on Multi-scale Feature Parsing
    Wang, Rui
    Liang, Jianguo
    Hua, Rong
    Beijing Daxue Xuebao (Ziran Kexue Ban)/Acta Scientiarum Naturalium Universitatis Pekinensis, 2024, 60 (01): : 71 - 78
  • [4] Cross-Modal Hash Method Based on Multi-Scale Fusion and Projection Matching Constraint
    Deng Wanyu
    Zhao Yina
    Yang Wanzhen
    Zhang Bo
    Li Hao
    Ye Shuqi
    LASER & OPTOELECTRONICS PROGRESS, 2022, 59 (24)
  • [5] Semantics-preserving hashing based on multi-scale fusion for cross-modal retrieval
    Zhang, Hong
    Pan, Min
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (11) : 17299 - 17314
  • [6] Semantics-preserving hashing based on multi-scale fusion for cross-modal retrieval
    Hong Zhang
    Min Pan
    Multimedia Tools and Applications, 2021, 80 : 17299 - 17314
  • [7] Medical Report Generation Method Based on Multi-Scale Feature Fusion and Cross-Training
    Han, Qi
    Zhang, Shujun
    Tan, Liwei
    Li, Jinsong
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2024, 36 (05): : 795 - 804
  • [8] Reinforced Cross-modal Alignment for Radiology Report Generation
    Qin, Han
    Song, Yan
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 448 - 458
  • [9] CGFTrans: Cross-Modal Global Feature Fusion Transformer for Medical Report Generation
    Xu, Liming
    Tang, Quan
    Zheng, Bochuan
    Lv, Jiancheng
    Li, Weisheng
    Zeng, Xianhua
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2024, 28 (09) : 5600 - 5612
  • [10] IFNet: An Image-Enhanced Cross-Modal Fusion Network for Radiology Report Generation
    Guo, Yi
    Hou, Xiaodi
    Liu, Zhi
    Zhang, Yijia
    BIOINFORMATICS RESEARCH AND APPLICATIONS, PT I, ISBRA 2024, 2024, 14954 : 286 - 297