BLSTM-based handwritten text recognition using Web resources

被引:0
|
作者
Oprean, Cristina [1 ,2 ]
Likforman-Sulem, Laurence [1 ,2 ]
Mokbel, Chafic [3 ]
Popescu, Adrian [4 ]
机构
[1] Telecom ParisTech, Inst Mines Telecom, Paris, France
[2] CNRS LTCI, Paris, France
[3] Univ Balamand, Fac Engn, Tripoli, Lebanon
[4] CEA, LIST, Vis & Content Engn Lab, Gif Sur Yvette, France
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Handwriting recognition systems usually rely on static dictionaries and language models. Full coverage of these dictionaries is generally not achieved when dealing with unrestricted document corpora due to the presence of Out-Of-Vocabulary words. In a previous work, dynamic dictionaries were built from Web resources and successfully applied to isolated word recognition. In the present work we extend this approach to text-line recognition. Line segmentation into words is needed to exploit dynamic dictionaries and it is performed using BLSTM classifiers to align filler models and word sequence outputs. Words are then classified based on the confidence score into anchor and non-anchor words (AWs and NAWs). AWs are equated to the BLSTM outputs and used as such. Dynamic dictionaries are built for NAWs by exploiting Web resources for their character sequence and for neighboring AWs. Text-lines are decoded again using dynamic dictionaries and re-estimated language model. We conduct experiments on the publicly available RIMES database and show that the introduction of the dynamic dictionary is beneficial. Equally important, we show that the gain increases as the proportion of OOVs increases.
引用
收藏
页码:466 / 470
页数:5
相关论文
共 50 条
  • [1] Offline Handwritten Text Recognition Using Hybrid CNN-BLSTM Network
    Namdeo, Rahul Kumar
    Gupta, Chetan
    Shrivastava, Ritu
    Proceedings - 2022 IEEE 11th International Conference on Communication Systems and Network Technologies, CSNT 2022, 2022, : 318 - 323
  • [2] ON TEMPORAL CONTEXT INFORMATION FOR HYBRID BLSTM-BASED PHONEME RECOGNITION
    Lohrenz, Timo
    Strake, Maximilian
    Fingscheidt, Tim
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 516 - 523
  • [3] Unconstrained Handwritten Word Recognition based on Trigrams Using BLSTM
    Zhang, Xi
    Tan, Chew Lim
    2014 22ND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2014, : 2914 - 2919
  • [4] BLSTM-BASED CONFIDENCE ESTIMATION FOR END-TO-END SPEECH RECOGNITION
    Ogawa, Atsunori
    Tawara, Naohiro
    Kano, Takatomo
    Delcroix, Marc
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6383 - 6387
  • [5] Deep BLSTM Neural Networks for Unconstrained Continuous Handwritten Text Recognition
    Frinken, Volkmar
    Uchida, Seiichi
    2015 13TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2015, : 911 - 915
  • [6] Web Application System of Handwritten Text Recognition
    Bodnia, Yevhen
    Kozulia, Mariia
    COLINS 2021: COMPUTATIONAL LINGUISTICS AND INTELLIGENT SYSTEMS, VOL I, 2021, 2870
  • [7] Text Recognition using Deep BLSTM Networks
    Ray, Anupama
    Rajeswar, Sai
    Chaudhury, Santanu
    2015 EIGHTH INTERNATIONAL CONFERENCE ON ADVANCES IN PATTERN RECOGNITION (ICAPR), 2015, : 207 - +
  • [8] Handwritten word recognition using Web resources and recurrent neural networks
    Cristina Oprean
    Laurence Likforman-Sulem
    Adrian Popescu
    Chafic Mokbel
    International Journal on Document Analysis and Recognition (IJDAR), 2015, 18 : 287 - 301
  • [9] Handwritten word recognition using Web resources and recurrent neural networks
    Oprean, Cristina
    Likforman-Sulem, Laurence
    Popescu, Adrian
    Mokbel, Chafic
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2015, 18 (04) : 287 - 301
  • [10] Cascading BLSTM Networks For Handwritten Word Recognition
    Stuner, Bruno
    Chatelain, Clement
    Paquet, Thierry
    2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 3416 - 3421