Multimodal image translation via deep learning inference model trained in video domain

被引:1
|
作者
Fan, Jiawei [1 ,2 ,3 ]
Liu, Zhiqiang [4 ]
Yang, Dong [1 ,2 ,3 ]
Qiao, Jian [1 ,2 ,3 ]
Zhao, Jun [1 ,2 ,3 ]
Wang, Jiazhou [1 ,2 ,3 ]
Hu, Weigang [1 ,2 ,3 ]
机构
[1] Fudan Univ, Dept Radiat Oncol, Shanghai Canc Ctr, Shanghai 200032, Peoples R China
[2] Fudan Univ, Shanghai Med Coll, Dept Oncol, Shanghai 200032, Peoples R China
[3] Shanghai Key Lab Radiat Oncol, Shanghai 200032, Peoples R China
[4] Chinese Acad Med Sci & Peking Union Med Coll, Canc Hosp, Natl Clin Res Ctr Canc, Natl Canc Ctr, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Video domain; Deep learning; Medical image translation; GAN;
D O I
10.1186/s12880-022-00854-x
中图分类号
R8 [特种医学]; R445 [影像诊断学];
学科分类号
1002 ; 100207 ; 1009 ;
摘要
Background Current medical image translation is implemented in the image domain. Considering the medical image acquisition is essentially a temporally continuous process, we attempt to develop a novel image translation framework via deep learning trained in video domain for generating synthesized computed tomography (CT) images from cone-beam computed tomography (CBCT) images. Methods For a proof-of-concept demonstration, CBCT and CT images from 100 patients were collected to demonstrate the feasibility and reliability of the proposed framework. The CBCT and CT images were further registered as paired samples and used as the input data for the supervised model training. A vid2vid framework based on the conditional GAN network, with carefully-designed generators, discriminators and a new spatio-temporal learning objective, was applied to realize the CBCT-CT image translation in the video domain. Four evaluation metrics, including mean absolute error (MAE), peak signal-to-noise ratio (PSNR), normalized cross-correlation (NCC), and structural similarity (SSIM), were calculated on all the real and synthetic CT images from 10 new testing patients to illustrate the model performance. Results The average values for four evaluation metrics, including MAE, PSNR, NCC, and SSIM, are 23.27 +/- 5.53, 32.67 +/- 1.98, 0.99 +/- 0.0059, and 0.97 +/- 0.028, respectively. Most of the pixel-wise hounsfield units value differences between real and synthetic CT images are within 50. The synthetic CT images have great agreement with the real CT images and the image quality is improved with lower noise and artifacts compared with CBCT images. Conclusions We developed a deep-learning-based approach to perform the medical image translation problem in the video domain. Although the feasibility and reliability of the proposed framework were demonstrated by CBCT-CT image translation, it can be easily extended to other types of medical images. The current results illustrate that it is a very promising method that may pave a new path for medical image translation research.
引用
收藏
页数:9
相关论文
共 50 条
  • [21] Unsupervised Deep Learning for Medical Image Translation
    Schaefferkoetter, Josh
    Ortega, Claudia
    Metser, Ur
    Veit-Haibach, Patrick
    JOURNAL OF NUCLEAR MEDICINE, 2020, 61
  • [22] Domain Adaptation for In-Air to Underwater Image Enhancement via Deep Learning
    Bing, Xuewen
    Ren, Wenqi
    Tang, Yang
    Yen, Gary G.
    Sun, Qiyu
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024, 8 (01): : 1015 - 1029
  • [23] Accelerating Deep Learning Inference via Model Parallelism and Partial Computation Offloading
    Zhou, Huan
    Li, Mingze
    Wang, Ning
    Min, Geyong
    Wu, Jie
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2023, 34 (02) : 475 - 488
  • [24] Analysing deep reinforcement learning agents trained with domain randomisation
    Dai, Tianhong
    Arulkumaran, Kai
    Gerbert, Tamara
    Tukra, Samyakh
    Behbahani, Feryal
    Bharath, Anil Anthony
    NEUROCOMPUTING, 2022, 493 : 143 - 165
  • [25] Enhancing medical image classification via federated learning and pre-trained model
    Srinivasu, Parvathaneni Naga
    Lakshmi, G. Jaya
    Narahari, Sujatha Canavoy
    Shafi, Jana
    Choi, Jaeyoung
    Ijaz, Muhammad Fazal
    EGYPTIAN INFORMATICS JOURNAL, 2024, 27
  • [26] Unsupervised multi-domain multimodal image-to-image translation with explicit domain-constrained disentanglement
    Xia, Weihao
    Yang, Yujiu
    Xue, Jing-Hao
    NEURAL NETWORKS, 2020, 131 : 50 - 63
  • [27] Image Captioning Using Multimodal Deep Learning Approach
    Farkh, Rihem
    Oudinet, Ghislain
    Foued, Yasser
    Computers, Materials and Continua, 2024, 81 (03): : 3951 - 3968
  • [28] Multimodal Deep Learning in Semantic Image Segmentation: A Review
    Raman, Vishal
    Kumari, Madhu
    PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTERNET OF THINGS (CCIOT 2018), 2018, : 7 - 11
  • [29] Multimodal Visibility Deep Learning Model Based on Visible-Infrared Image Pair
    Shen K.
    Shi Q.
    Wang H.
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2021, 33 (06): : 939 - 946
  • [30] Design of Multimodal Retrieval Model for Translation Domain Based on BERT
    Sheng, Xia
    PROCEEDINGS OF 2024 INTERNATIONAL CONFERENCE ON MACHINE INTELLIGENCE AND DIGITAL APPLICATIONS, MIDA2024, 2024, : 168 - 172