Efficient Medical Images Text Detection with Vision-Language Pre-training Approach

被引:0
|
作者
Li, Tianyang [1 ,2 ]
Bai, Jinxu [1 ]
Wang, Qingzhu [1 ]
Xu, Hanwen [1 ]
机构
[1] Northeast Elect Power Univ, Comp Sci, Jilin, Peoples R China
[2] Jiangxi New Energy Technol Inst, Nanchang, Jiangxi, Peoples R China
关键词
vision-language pre-training; medical text detection; feature enhancement; differentiable binarization;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text detection in medical images is a critical task, essential for automating the extraction of valuable information from diverse healthcare documents. Conventional text detection methods, predominantly based on segmentation, encounter substantial challenges when confronted with text-rich images, extreme aspect ratios, and multi-oriented text. In response to these complexities, this paper introduces an innovative text detection system aimed at enhancing its efficacy. Our proposed system comprises two fundamental components: the Efficient Feature Enhancement Module (EFEM) and the Multi-Scale Feature Fusion Module (MSFM), both serving as integral elements of the segmentation head. The EFEM incorporates a spatial attention mechanism to improve segmentation performance by introducing multi-level information. The MSFM merges features from the EFEM at different depths and scales to generate final segmentation features. In conjunction with our segmentation methodology, our post-processing module employs a differentiable binarization technique, facilitating adaptive threshold adjustment to enhance text detection precision. To further bolster accuracy and robustness, we introduce the integration of a vision-language pre-training model. Through extensive pretraining on large-scale visual language understanding tasks, this model amasses a wealth of rich visual and semantic representations. When seamlessly integrated with the segmentation module, the pretraining model effectively leverages its potent representation capabilities. Our proposed model undergoes rigorous evaluation on medical text image datasets, consistently demonstrating exceptional performance. Benchmark experiments reaffirm its efficacy.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Enhancing medical text detection with vision-language pre-training and efficient segmentation
    Li, Tianyang
    Bai, Jinxu
    Wang, Qingzhu
    COMPLEX & INTELLIGENT SYSTEMS, 2024, 10 (03) : 3995 - 4007
  • [2] Language Matters: A Weakly Supervised Vision-Language Pre-training Approach for Scene Text Detection and Spotting
    Xue, Chuhui
    Zhang, Wenqing
    Hao, Yu
    Lu, Shijian
    Torr, Philip H. S.
    Bai, Song
    COMPUTER VISION - ECCV 2022, PT XXVIII, 2022, 13688 : 284 - 302
  • [3] Multi-Modal Understanding and Generation for Medical Images and Text via Vision-Language Pre-Training
    Moon, Jong Hak
    Lee, Hyungyung
    Shin, Woncheol
    Kim, Young-Hak
    Choi, Edward
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2022, 26 (12) : 6070 - 6080
  • [4] Vision-Language Pre-Training for Boosting Scene Text Detectors
    Song, Sibo
    Wan, Jianqiang
    Yang, Zhibo
    Tang, Jun
    Cheng, Wenqing
    Bai, Xiang
    Yao, Cong
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 15660 - 15670
  • [5] Survey on Vision-language Pre-training
    Yin J.
    Zhang Z.-D.
    Gao Y.-H.
    Yang Z.-W.
    Li L.
    Xiao M.
    Sun Y.-Q.
    Yan C.-G.
    Ruan Jian Xue Bao/Journal of Software, 2023, 34 (05): : 2000 - 2023
  • [6] Position-guided Text Prompt for Vision-Language Pre-training
    Wang, Jinpeng
    Zhou, Pan
    Shou, Mike Zheng
    Yan, Shuicheng
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 23242 - 23251
  • [7] Subsampling of Frequent Words in Text for Pre-training a Vision-Language Model
    Liang, Mingliang
    Larson, Martha
    PROCEEDINGS OF THE 1ST WORKSHOP ON LARGE GENERATIVE MODELS MEET MULTIMODAL APPLICATIONS, LGM3A 2023, 2023, : 61 - 67
  • [8] Improving Medical Speech-to-Text Accuracy using Vision-Language Pre-training Models
    Huh, Jaeyoung
    Park, Sangjoon
    Lee, Jeong Eun
    Ye, Jong Chul
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2024, 28 (03) : 1692 - 1703
  • [9] VLP: A Survey on Vision-language Pre-training
    Chen, Fei-Long
    Zhang, Du-Zhen
    Han, Ming-Lun
    Chen, Xiu-Yi
    Shi, Jing
    Xu, Shuang
    Xu, Bo
    MACHINE INTELLIGENCE RESEARCH, 2023, 20 (01) : 38 - 56
  • [10] VLP: A Survey on Vision-language Pre-training
    Fei-Long Chen
    Du-Zhen Zhang
    Ming-Lun Han
    Xiu-Yi Chen
    Jing Shi
    Shuang Xu
    Bo Xu
    Machine Intelligence Research, 2023, 20 (01) : 38 - 56