Unsupervised language model adaptation for handwritten Chinese text recognition

被引:14
|
作者
Wang, Qiu-Feng [1 ]
Yin, Fei [1 ]
Liu, Cheng-Lin [1 ]
机构
[1] Chinese Acad Sci, Inst Automat, NLPR, Beijing 100190, Peoples R China
基金
中国国家自然科学基金;
关键词
Character string recognition; Chinese handwriting recognition; Unsupervised language model adaptation; Language model compression; OFFLINE RECOGNITION; CHARACTER;
D O I
10.1016/j.patcog.2013.09.015
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents an effective approach for unsupervised language model adaptation (LMA) using multiple models in offline recognition of unconstrained handwritten Chinese texts. The domain of the document to recognize is variable and usually unknown a priori, so we use a two-pass recognition strategy with a pre-defined multi-domain language model set. We propose three methods to dynamically generate an adaptive language model to match the text output by first-pass recognition: model selection, model combination and model reconstruction. In model selection, we use the language model with minimum perplexity on the first-pass recognized text. By model combination, we learn the combination weights via minimizing the sum of squared error with both L2-norm and L1-norm regularization. For model reconstruction, we use a group of orthogonal bases to reconstruct a language model with the coefficients learned to match the document to recognize. Moreover, we reduce the storage size of multiple language models using two compression methods of split vector quantization (SVQ) and principal component analysis (PCA). Comprehensive experiments on two public Chinese handwriting databases CASIA-HWDB and HIT-MW show that the proposed unsupervised LMA approach improves the recognition performance impressively, particularly for ancient domain documents with the recognition accuracy improved by 7 percent. Meanwhile, the combination of the two compression methods largely reduces the storage size of language models with little loss of recognition accuracy. (C) 2013 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1202 / 1216
页数:15
相关论文
共 50 条
  • [31] Attention Combination of Sequence Models for Handwritten Chinese Text Recognition
    Zhu, Zheng-Yu
    Yin, Fei
    Wang, Da-Han
    2020 17TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2020), 2020, : 288 - 294
  • [32] Fully Convolutional Recurrent Network for Handwritten Chinese Text Recognition
    Xie, Zecheng
    Sun, Zenghui
    Jin, Lianwen
    Feng, Ziyong
    Zhang, Shuye
    2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 4011 - 4016
  • [33] Unsupervised crosslingual adaptation of tokenisers for spoken language recognition
    Ng, Raymond W. M.
    Nicolao, Mauro
    Hain, Thomas
    COMPUTER SPEECH AND LANGUAGE, 2017, 46 : 327 - 342
  • [34] Multiple handwritten text line recognition systems derived from specific integration of a language model
    Bertolami, R
    Bunke, H
    EIGHTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS 1 AND 2, PROCEEDINGS, 2005, : 521 - 525
  • [35] Fast writer adaptation with style extractor network for handwritten text recognition
    Wang, Zi-Rui
    Du, Jun
    NEURAL NETWORKS, 2022, 147 : 42 - 52
  • [36] Unsupervised Domain Adaptation via Class Aggregation for Text Recognition
    Liu, Xiao-Qian
    Ding, Xue-Ying
    Luo, Xin
    Xu, Xin-Shun
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (10) : 5617 - 5630
  • [37] Unsupervised Adaptation of Neural Networks for Chinese Handwriting Recognition
    Yang, Hong-Ming
    Zhang, Xu-Yao
    Yin, Fei
    Luo, Zhenbo
    Liu, Cheng-Lin
    PROCEEDINGS OF 2016 15TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR), 2016, : 512 - 517
  • [38] A Compact CNN-DBLSTM Based Character Model For Online Handwritten Chinese Text Recognition
    Chen, Kai
    Tian, Li
    Ding, Haisong
    Cai, Meng
    Sun, Lei
    Liang, Sen
    Huo, Qiang
    2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 1068 - 1073
  • [39] A Bayesian-based probabilistic model for unconstrained handwritten offline Chinese text line recognition
    Li, Nanxi
    Jin, Lianwen
    IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC 2010), 2010, : 3664 - 3668
  • [40] Deep Neural Network based Hidden Markov Model for Offline Handwritten Chinese Text Recognition
    Du, Jun
    Wang, Zi-Rui
    Zhai, Jian-Fang
    Hu, Jin-Shui
    2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 3428 - 3433