SEE: Towards Semi-Supervised End-to-End Scene Text Recognition

被引：0

作者：

Bartz, Christian ^{[1
]}

Yang, Haojin ^{[1
]}

Meinel, Christoph ^{[1
]}

机构：

[1] Univ Potsdam, Hasso Plattner Inst, Prof Dr Helmert Str 2-3, D-14482 Potsdam, Germany

来源：

THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | 2018年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Detecting and recognizing text in natural scene images is a challenging, yet not completely solved task. In recent years several new systems that try to solve at least one of the two sub-tasks (text detection and text recognition) have been proposed. In this paper we present SEE, a step towards semi-supervised neural networks for scene text detection and recognition, that can be optimized end-to-end. Most existing works consist of multiple deep neural networks and several pre-processing steps. In contrast to this, we propose to use a single deep neural network, that learns to detect and recognize text from natural images, in a semi-supervised way. SEE is a network that integrates and jointly learns a spatial transformer network, which can learn to detect text regions in an image, and a text recognition network that takes the identified text regions and recognizes their textual content. We introduce the idea behind our novel approach and show its feasibility, by performing a range of experiments on standard benchmark datasets, where we achieve competitive results.

引用

页码：6674 / 6681

页数：8

共 50 条

[21] End-to-End Rich Transcription-Style Automatic Speech Recognition with Semi-Supervised Learning
Tanaka, Tomohiro
Masumura, Ryo
Ihori, Mana
Takashima, Akihiko
Orihashi, Shota
Makishima, Naoki
INTERSPEECH 2021, 2021, : 4458 - 4462
[22] Tic action recognition for children tic disorder with end-to-end video semi-supervised learning
Wang, Xiangyang
Yang, Kun
Ding, Qiang
Wang, Rui
Sun, Jinhua
VISUAL COMPUTER, 2025,
[23] SEQUENCE-LEVEL CONSISTENCY TRAINING FOR SEMI-SUPERVISED END-TO-END AUTOMATIC SPEECH RECOGNITION
Masumura, Ryo
Ihori, Mana
Takashima, Akihiko
Moriya, Takafumi
Ando, Atsushi
Shinohara, Yusuke
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7054 - 7058
[24] SEMI-SUPERVISED TRAINING FOR END-TO-END MODELS VIA WEAK DISTILLATION
Li, Bo
Sainath, Tara N.
Pang, Ruoming
Wu, Zelin
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 2837 - 2841
[25] Dialect-aware Semi-supervised Learning for End-to-End Multi-dialect Speech Recognition
Shiota, Sayaka
Imaizumi, Ryo
Masumura, Ryo
Kiya, Hitoshi
PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 240 - 244
[26] EEM: An End-to-end Evaluation Metric for Scene Text Detection and Recognition
Hao, Jiedong
Wen, Yafei
Deng, Jie
Gan, Jun
Ren, Shuai
Tan, Hui
Chen, Xiaoxin
DOCUMENT ANALYSIS AND RECOGNITION, ICDAR 2021, PT IV, 2021, 12824 : 95 - 108
[27] Multiple-hypothesis CTC-based semi-supervised adaptation of end-to-end speech recognition
Do, Cong-Thanh
Doddipatla, Rama
Hain, Thomas
2021, arXiv
[28] Person Re-identification with End-to-End Scene Text Recognition
Kamlesh
Xu, Pei
Yang, Yang
Xu, Yongchao
COMPUTER VISION, PT III, 2017, 773 : 363 - 374
[29] End-to-End Analysis for Text Detection and Recognition in Natural Scene Images
Alnefaie, Ahlam
Gupta, Deepak
Bhuyan, Monowar H.
Razzak, Imran
Gupta, Prashant
Prasad, Mukesh
2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
[30] An end-to-end model for multi-view scene text recognition
Banerjee, Ayan
Shivakumara, Palaiahnakote
Bhattacharya, Saumik
Pal, Umapada
Liu, Cheng-Lin
PATTERN RECOGNITION, 2024, 149

← 1 2 3 4 5 →