Multimodal image translation via deep learning inference model trained in video domain

被引:1
|
作者
Fan, Jiawei [1 ,2 ,3 ]
Liu, Zhiqiang [4 ]
Yang, Dong [1 ,2 ,3 ]
Qiao, Jian [1 ,2 ,3 ]
Zhao, Jun [1 ,2 ,3 ]
Wang, Jiazhou [1 ,2 ,3 ]
Hu, Weigang [1 ,2 ,3 ]
机构
[1] Fudan Univ, Dept Radiat Oncol, Shanghai Canc Ctr, Shanghai 200032, Peoples R China
[2] Fudan Univ, Shanghai Med Coll, Dept Oncol, Shanghai 200032, Peoples R China
[3] Shanghai Key Lab Radiat Oncol, Shanghai 200032, Peoples R China
[4] Chinese Acad Med Sci & Peking Union Med Coll, Canc Hosp, Natl Clin Res Ctr Canc, Natl Canc Ctr, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Video domain; Deep learning; Medical image translation; GAN;
D O I
10.1186/s12880-022-00854-x
中图分类号
R8 [特种医学]; R445 [影像诊断学];
学科分类号
1002 ; 100207 ; 1009 ;
摘要
Background Current medical image translation is implemented in the image domain. Considering the medical image acquisition is essentially a temporally continuous process, we attempt to develop a novel image translation framework via deep learning trained in video domain for generating synthesized computed tomography (CT) images from cone-beam computed tomography (CBCT) images. Methods For a proof-of-concept demonstration, CBCT and CT images from 100 patients were collected to demonstrate the feasibility and reliability of the proposed framework. The CBCT and CT images were further registered as paired samples and used as the input data for the supervised model training. A vid2vid framework based on the conditional GAN network, with carefully-designed generators, discriminators and a new spatio-temporal learning objective, was applied to realize the CBCT-CT image translation in the video domain. Four evaluation metrics, including mean absolute error (MAE), peak signal-to-noise ratio (PSNR), normalized cross-correlation (NCC), and structural similarity (SSIM), were calculated on all the real and synthetic CT images from 10 new testing patients to illustrate the model performance. Results The average values for four evaluation metrics, including MAE, PSNR, NCC, and SSIM, are 23.27 +/- 5.53, 32.67 +/- 1.98, 0.99 +/- 0.0059, and 0.97 +/- 0.028, respectively. Most of the pixel-wise hounsfield units value differences between real and synthetic CT images are within 50. The synthetic CT images have great agreement with the real CT images and the image quality is improved with lower noise and artifacts compared with CBCT images. Conclusions We developed a deep-learning-based approach to perform the medical image translation problem in the video domain. Although the feasibility and reliability of the proposed framework were demonstrated by CBCT-CT image translation, it can be easily extended to other types of medical images. The current results illustrate that it is a very promising method that may pave a new path for medical image translation research.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Multimodal image translation via deep learning inference model trained in video domain
    Jiawei Fan
    Zhiqiang Liu
    Dong Yang
    Jian Qiao
    Jun Zhao
    Jiazhou Wang
    Weigang Hu
    BMC Medical Imaging, 22
  • [2] Multimodal Image Translation Via Deep Learning Inference Model Trained in Video Domain
    Fan, J.
    Yang, D.
    Wang, J.
    Hu, W.
    MEDICAL PHYSICS, 2022, 49 (06) : E653 - E653
  • [3] Unsupervised Multimodal Video-to-Video Translation via Self-Supervised Learning
    Liu, Kangning
    Gu, Shuhang
    Romero, Andres
    Timofte, Radu
    2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021), 2021, : 1029 - 1039
  • [4] Unsupervised content and style learning for multimodal cross-domain image translation
    Lin, Zhijie
    Chen, Jingjing
    Ma, Xiaolong
    Li, Chao
    Zhang, Huiming
    Zhao, Lei
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [5] Cross-Domain Infrared Image Classification via Image-to-Image Translation and Deep Domain Generalization
    Guo, Zhao-Rui
    Niu, Jia-Wei
    Liu, Zhun-Ga
    2022 17TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION, ROBOTICS AND VISION (ICARCV), 2022, : 487 - 493
  • [6] A new multimodal deep-learning model to video scene segmentation
    Trojahn, Tiago H.
    Kishi, Rodrigo M.
    Goularte, Rudinei
    WEBMEDIA'18: PROCEEDINGS OF THE 24TH BRAZILIAN SYMPOSIUM ON MULTIMEDIA AND THE WEB, 2018, : 205 - 212
  • [7] Multimodal Fused Deep Learning Networks for Domain Specific Image Similarity Search
    Waqas, Umer
    Visser, Jesse Wiebe
    Choe, Hana
    Lee, Donghun
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 75 (01): : 243 - 258
  • [8] Activity Image-to-Video Retrieval via Domain Adversarial Learning
    Liu, Yubin
    Yang, Jinfu
    Yan, Xue
    Song, Lin
    2022 34TH CHINESE CONTROL AND DECISION CONFERENCE, CCDC, 2022, : 6183 - 6188
  • [9] Multimodal Machine Learning for Video and Image Analysis
    Ghosh, Shalini
    KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 3608 - 3608
  • [10] Multimodal deep representation learning for video classification
    Haiman Tian
    Yudong Tao
    Samira Pouyanfar
    Shu-Ching Chen
    Mei-Ling Shyu
    World Wide Web, 2019, 22 : 1325 - 1341