Unsupervised language model adaptation for handwritten Chinese text recognition

被引:14
|
作者
Wang, Qiu-Feng [1 ]
Yin, Fei [1 ]
Liu, Cheng-Lin [1 ]
机构
[1] Chinese Acad Sci, Inst Automat, NLPR, Beijing 100190, Peoples R China
基金
中国国家自然科学基金;
关键词
Character string recognition; Chinese handwriting recognition; Unsupervised language model adaptation; Language model compression; OFFLINE RECOGNITION; CHARACTER;
D O I
10.1016/j.patcog.2013.09.015
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents an effective approach for unsupervised language model adaptation (LMA) using multiple models in offline recognition of unconstrained handwritten Chinese texts. The domain of the document to recognize is variable and usually unknown a priori, so we use a two-pass recognition strategy with a pre-defined multi-domain language model set. We propose three methods to dynamically generate an adaptive language model to match the text output by first-pass recognition: model selection, model combination and model reconstruction. In model selection, we use the language model with minimum perplexity on the first-pass recognized text. By model combination, we learn the combination weights via minimizing the sum of squared error with both L2-norm and L1-norm regularization. For model reconstruction, we use a group of orthogonal bases to reconstruct a language model with the coefficients learned to match the document to recognize. Moreover, we reduce the storage size of multiple language models using two compression methods of split vector quantization (SVQ) and principal component analysis (PCA). Comprehensive experiments on two public Chinese handwriting databases CASIA-HWDB and HIT-MW show that the proposed unsupervised LMA approach improves the recognition performance impressively, particularly for ancient domain documents with the recognition accuracy improved by 7 percent. Meanwhile, the combination of the two compression methods largely reduces the storage size of language models with little loss of recognition accuracy. (C) 2013 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1202 / 1216
页数:15
相关论文
共 50 条
  • [1] Retrieval-based language model adaptation for handwritten Chinese text recognition
    Hu, Shuying
    Wang, Qiufeng
    Huang, Kaizhu
    Wen, Min
    Coenen, Frans
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2023, 26 (02) : 109 - 119
  • [2] Retrieval-based language model adaptation for handwritten Chinese text recognition
    Shuying Hu
    Qiufeng Wang
    Kaizhu Huang
    Min Wen
    Frans Coenen
    International Journal on Document Analysis and Recognition (IJDAR), 2023, 26 : 109 - 119
  • [3] Unsupervised writer adaptation applied to handwritten text recognition
    Nosary, A
    Heutte, L
    Paquet, T
    PATTERN RECOGNITION, 2004, 37 (02) : 385 - 388
  • [4] Searching from the Prediction of Visual and Language Model for Handwritten Chinese Text Recognition
    Liu, Brian
    Sun, Weicong
    Kang, Wenjing
    Xu, Xianchao
    DOCUMENT ANALYSIS AND RECOGNITION, ICDAR 2021, PT III, 2021, 12823 : 274 - 288
  • [5] Topic Language Model Adaption for Recognition of Homologous Offline Handwritten Chinese Text Image
    Wang, Yanwei
    Ding, Xiaoqing
    Liu, Changsong
    IEEE SIGNAL PROCESSING LETTERS, 2014, 21 (05) : 550 - 553
  • [6] Unsupervised language model adaptation for meeting recognition
    Tur, Gokhan
    Stolcke, Andreas
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 173 - +
  • [7] Evaluation of Neural Network Language Models In Handwritten Chinese Text Recognition
    Wu, Yi-Chao
    Yin, Fei
    Liu, Cheng-Lin
    2015 13TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2015, : 166 - 170
  • [8] A Hybrid Language Model for Handwritten Chinese Sentence Recognition
    He, Qizhen
    Chen, Shijie
    Zhao, Mingxi
    Lin, Wei
    13TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2012), 2012, : 129 - 134
  • [9] Language Technology for Handwritten Text Recognition
    Toselli, Alejandro H.
    Serrano, Nicolas
    Gimenez-Pastor, Adria
    Khoury, Ihab
    Juan, Alfons
    Vidal, Enrique
    ADVANCES IN SPEECH AND LANGUAGE TECHNOLOGIES FOR IBERIAN LANGUAGES, 2012, 328 : 178 - 186
  • [10] Handwritten text recognition through writer adaptation
    Nosary, A
    Paquet, T
    Heutte, L
    Bensefia, A
    EIGHTH INTERNATIONAL WORKSHOP ON FRONTIERS IN HANDWRITING RECOGNITION: PROCEEDINGS, 2002, : 363 - 368