Soft set-based MSER end-to-end system for occluded scene text detection, recognition and prediction

被引：0

作者：

Das, Alloy ^{[1
]}

Palaiahnakote, Shivakumara ^{[2
]}

Banerjee, Ayan ^{[1
]}

Antonacopoulos, Apostolos ^{[2
]}

Pal, Umapada ^{[1
]}

机构：

[1] Indian Stat Inst, Comp Vis & Pattern Recognit Unit, Kolkata, India

[2] Univ Salford, Pattern Recognit & Image Anal PRImA Res Lab, Manchester, England

来源：

KNOWLEDGE-BASED SYSTEMS | 2024年 / 305卷

关键词：

Scene text detection; Scene text recognition; Scene text correction; Occluded scene text; Graph neural network; Convolutional recurrent neural network; Convolutional neural network;

D O I：

10.1016/j.knosys.2024.112593

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The presence of unpredictable occlusions on natural scene text is a significant challenge, exacerbating the difficulties already posed on text detection and recognition by the variability of such images. Addressing the need for a robust, consistently performing approach that can effectively address the above challenges, this paper presents a new Soft Set-based end-to-end system for text detection, recognition and prediction in occluded natural scene images. This is the first approach to integrate text detection, recognition and prediction, unlike existing systems developed for end-to-end text spotting (text detection and recognition) only. For candidate text components detection, the proposed combination of Soft Sets with Maximally Stable Extremal Regions (SSMSER) improves text detection and spotting in natural scene images, irrespectively of the presence of arbitrarily orientated and shaped text, complex backgrounds and occlusion. Furthermore, a Graph Recurrent Neural Network is proposed for grouping candidate text components into text lines and for fitting accurate bounding boxes to each word. Finally, a Convolutional Recurrent Neural Network (CRNN) is proposed for the recognition of text and for predicting missing characters due to occlusion. Experimental results on a new occluded scene text dataset (OSTD) and on the most relevant benchmark natural scene text datasets demonstrate that the proposed system outperforms the state-of-the-art in text detection, recognition and prediction. The code and dataset are available at https://github.com/alloydas/Softset-MSER-Based-Occluded-Scene-Text-Spotting/blob/master/S oft_set_MSER.ipynb

引用

页数：19

共 50 条

[41] End-to-end Chinese character detection in natural scene based on improved YOLOv2
Liu J.
Zhu X.
Song M.-M.
Kongzhi yu Juece/Control and Decision, 2021, 36 (10): : 2483 - 2489
[42] Capsule Network based End-to-end System for Detection of Replay Attacks
Ouyang, Meidan
Das, Rohan Kumar
Yang, Jichen
Li, Haizhou
2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
[43] Development of CRF and CTC Based End-To-End Kazakh Speech Recognition System
Oralbekova, Dina
Mamyrbayev, Orken
Othman, Mohamed
Alimhan, Keylan
Zhumazhanov, Bagashar
Nuranbayeva, Bulbul
INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2022, PT I, 2022, 13757 : 519 - 531
[44] Hardware Accelerator for Transformer based End-to-End Automatic Speech Recognition System
Yamini, Shaarada D.
Mirishkar, Ganesh S.
Vuppala, Anil Kumar
Purini, Suresh
2023 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, IPDPSW, 2023, : 93 - 100
[45] Myanmar Text-to-Speech System based on Tacotron (End-to-End Generative Model)
Win, Yuzana
Lwin, Htoo Pyae
Masada, Tomonari
11TH INTERNATIONAL CONFERENCE ON ICT CONVERGENCE: DATA, NETWORK, AND AI IN THE AGE OF UNTACT (ICTC 2020), 2020, : 572 - 577
[46] SEE-LPR: A Semantic Segmentation Based End-to-End System for Unconstrained License Plate Detection and Recognition
Tang, Dongqi
Kong, Hao
Meng, Xi
Liu, Ruo-Ze
Lu, Tong
MULTIMEDIA MODELING (MMM 2020), PT I, 2020, 11961 : 543 - 554
[47] End-to-end aluminum strip surface defects detection and recognition method based on ViBe
Ye G.
Li Y.-B.
Ma Z.-X.
Cheng J.
Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2020, 54 (10): : 1906 - 1914
[48] End-to-End Light License Plate Detection and Recognition Method Based on Deep Learning
Ma, Zongfang
Wu, Zheping
Cao, Yonggen
ELECTRONICS, 2023, 12 (01)
[49] An attention-based end-to-end model for multiple text lines recognition in japanese historical documents
Ly, Nam Tuan
Nguyen, Cuong Tuan
Nakagawa, Masaki
Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, 2019, : 629 - 634
[50] CONTEXT-AWARE MASK PREDICTION NETWORK FOR END-TO-END TEXT-BASED SPEECH EDITING
Wang, Tao
Yi, Jiangyan
Deng, Liqun
Fu, Ruibo
Tao, Jianhua
Wen, Zhengqi
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6082 - 6086

← 1 2 3 4 5 →