Efficient Medical Images Text Detection with Vision-Language Pre-training Approach

被引：0

作者：

Li, Tianyang ^{[1
,2
]}

Bai, Jinxu ^{[1
]}

Wang, Qingzhu ^{[1
]}

Xu, Hanwen ^{[1
]}

机构：

[1] Northeast Elect Power Univ, Comp Sci, Jilin, Peoples R China

[2] Jiangxi New Energy Technol Inst, Nanchang, Jiangxi, Peoples R China

来源：

ASIAN CONFERENCE ON MACHINE LEARNING, VOL 222 | 2023年 / 222卷

关键词：

vision-language pre-training; medical text detection; feature enhancement; differentiable binarization;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Text detection in medical images is a critical task, essential for automating the extraction of valuable information from diverse healthcare documents. Conventional text detection methods, predominantly based on segmentation, encounter substantial challenges when confronted with text-rich images, extreme aspect ratios, and multi-oriented text. In response to these complexities, this paper introduces an innovative text detection system aimed at enhancing its efficacy. Our proposed system comprises two fundamental components: the Efficient Feature Enhancement Module (EFEM) and the Multi-Scale Feature Fusion Module (MSFM), both serving as integral elements of the segmentation head. The EFEM incorporates a spatial attention mechanism to improve segmentation performance by introducing multi-level information. The MSFM merges features from the EFEM at different depths and scales to generate final segmentation features. In conjunction with our segmentation methodology, our post-processing module employs a differentiable binarization technique, facilitating adaptive threshold adjustment to enhance text detection precision. To further bolster accuracy and robustness, we introduce the integration of a vision-language pre-training model. Through extensive pretraining on large-scale visual language understanding tasks, this model amasses a wealth of rich visual and semantic representations. When seamlessly integrated with the segmentation module, the pretraining model effectively leverages its potent representation capabilities. Our proposed model undergoes rigorous evaluation on medical text image datasets, consistently demonstrating exceptional performance. Benchmark experiments reaffirm its efficacy.

引用

页数：16

共 50 条

[31] Unsupervised Domain Adaption Harnessing Vision-Language Pre-Training
Zhou, Wenlve
Zhou, Zhiheng
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (09) : 8201 - 8214
[32] Multimodal Pre-training Method for Vision-language Understanding and Generation
Liu T.-Y.
Wu Z.-X.
Chen J.-J.
Jiang Y.-G.
Ruan Jian Xue Bao/Journal of Software, 2023, 34 (05): : 2024 - 2034
[33] Unified Vision-Language Pre-Training for Image Captioning and VQA
Zhou, Luowei
Palangi, Hamid
Zhang, Lei
Hu, Houdong
Corso, Jason J.
Gao, Jianfeng
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 13041 - 13049
[34] Enhancing Visual Grounding in Vision-Language Pre-Training With Position-Guided Text Prompts
Wang, Alex Jinpeng
Zhou, Pan
Shou, Mike Zheng
Yan, Shuicheng
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (05) : 3406 - 3421
[35] Multi-Task Paired Masking With Alignment Modeling for Medical Vision-Language Pre-Training
Zhang, Ke
Yang, Yan
Yu, Jun
Jiang, Hanliang
Fan, Jianping
Huang, Qingming
Han, Weidong
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 4706 - 4721
[36] Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
Dou, Zi-Yi
Kamath, Aishwarya
Gan, Zhe
Zhang, Pengchuan
Wang, Jianfeng
Li, Linjie
Liu, Zicheng
Liu, Ce
LeCun, Yann
Peng, Nanyun
Gao, Jianfeng
Wang, Lijuan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[37] Vision-Language Pre-Training: Basics, Recent Advances, and Future Trends
Gan, Zhe
Li, Linjie
Li, Chunyuan
Wang, Lijuan
Liu, Zicheng
Gao, Jianfeng
FOUNDATIONS AND TRENDS IN COMPUTER GRAPHICS AND VISION, 2022, 14 (3-4): : 163 - 352
[38] EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE
Chen, Junyi
Guo, Longteng
Sun, Jia
Shao, Shuai
Yuan, Zehuan
Lin, Liang
Zhang, Dongyu
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 2, 2024, : 1110 - 1119
[39] Kaleido-BERT: Vision-Language Pre-training on Fashion Domain
Zhuge, Mingchen
Gao, Dehong
Fan, Deng-Ping
Jin, Linbo
Chen, Ben
Zhou, Haoming
Qiu, Minghui
Shao, Ling
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 12642 - 12652
[40] IMITATE: Clinical Prior Guided Hierarchical Vision-Language Pre-Training
Liu, Che
Cheng, Sibo
Shi, Miaojing
Shah, Anand
Bai, Wenjia
Arcucci, Rossella
IEEE TRANSACTIONS ON MEDICAL IMAGING, 2025, 44 (01) : 519 - 529

← 1 2 3 4 5 →