A real-time and effective text detection method for multi-scale and fuzzy text

被引:2
|
作者
Tong, Guoxiang [1 ]
Dong, Ming [1 ]
Song, Yan [1 ]
机构
[1] Univ Shanghai Sci & Technol, Dept Opt Elect & Comp Engn, Shanghai 200093, Peoples R China
关键词
Natural scene text detection; Attention mechanism; Feature path augmentation; CIoU loss; SCENE; ACCURATE;
D O I
10.1007/s11554-023-01267-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The text in the natural scene can be in various forms, dynamic blur and geometric perspective greatly affect the efficiency of text detection. Given the above situation, a real-time and effective text detection method is proposed to detect the multi-scale and fuzzy text. This method applies a convolutional attention mechanism to the feature extraction backbone to obtain more valuable text feature maps. To fully utilize the precise text location signals of the low-level features, a bottom-up path augmentation is used simultaneously. Besides, a few layers of the Resnet-50 backbone are cancelled to further shorten information communication path for balancing the speed and accuracy of detection. For text detection results, the four vertex coordinate values of the text boxes are regressed with the assistance of CIoU loss and shrinkage of text labels. Our model can effectively process an image in the fastest time of 112 ms and has a higher comprehensive indicator value than the other comparative models in ICDAR 2013, ICDAR 2015, and MSRA-TD500 datasets.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] Real-Time Scene Text Detection with Differentiable Binarization
    Liao, Minghui
    Wan, Zhaoyi
    Yao, Cong
    Chen, Kai
    Bai, Xiang
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 11474 - 11481
  • [22] Performance Evaluation of Real-time and Scale-invariant LoG Operators for Text Detection
    Dinh Cong Nguyen
    Delalandre, Mathieu
    Conte, Donatello
    The Anh Pham
    PROCEEDINGS OF THE 14TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISAPP), VOL 5, 2019, : 344 - 353
  • [23] Real-Time Text Detection with Multi-level Feature Fusion and Pixel Clustering
    Xu, Lu
    Jiang, Zhufeng
    Han, Xingyu
    Wang, Hui
    Fan, Zizhu
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT VII, 2025, 15037 : 16 - 29
  • [24] Natural scene text detection by multi-scale adaptive color clustering and non-text filtering
    Wu, Hui
    Zou, Beiji
    Zhao, Yu-Qian
    Chen, Zailiang
    Zhu, Chengzhang
    Guo, Jianjing
    NEUROCOMPUTING, 2016, 214 : 1011 - 1025
  • [25] Text Detection Algorithm Based on Multi-Scale Attention Feature Fusion
    She, Xiangyang
    Liu, Zhe
    Dong, Lihong
    Computer Engineering and Applications, 2024, 60 (01) : 198 - 206
  • [26] Multi-scale Information Fusion Combined with Residual Attention for Text Detection
    Zhao, Wenxiu
    Dongye, Changlei
    NEURAL INFORMATION PROCESSING, ICONIP 2023, PT II, 2024, 14448 : 506 - 518
  • [27] SCENE TEXT DETECTION BASED ON MULTI-SCALE SWT AND EDGE FILTERING
    Feng, Yuanyuan
    Song, Yonghong
    YualinZhang
    2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 645 - 650
  • [28] Multi-Scale Scene Text Detection Based on Convolutional Neural Network
    Lu, Yan-Feng
    Zhang, Ai-Xuan
    Li, Yi
    Yu, Qian-Hui
    Qiao, Hong
    2019 CHINESE AUTOMATION CONGRESS (CAC2019), 2019, : 583 - 587
  • [29] Real-time scale selection in hybrid multi-scale representations
    Lindeberg, T
    Bretzner, L
    SCALE SPACE METHODS IN COMPUTER VISION, PROCEEDINGS, 2003, 2695 : 148 - 163
  • [30] Subsampling-based HOG for Multi-scale real-time Pedestrian Detection
    Song, Peng-Lei
    Zhu, Yan
    Zhang, Zhen
    Zhang, Jian-Dong
    PROCEEDINGS OF THE IEEE 2019 9TH INTERNATIONAL CONFERENCE ON CYBERNETICS AND INTELLIGENT SYSTEMS (CIS) ROBOTICS, AUTOMATION AND MECHATRONICS (RAM) (CIS & RAM 2019), 2019, : 24 - 29