Unsupervised language model adaptation for handwritten Chinese text recognition

被引:14
|
作者
Wang, Qiu-Feng [1 ]
Yin, Fei [1 ]
Liu, Cheng-Lin [1 ]
机构
[1] Chinese Acad Sci, Inst Automat, NLPR, Beijing 100190, Peoples R China
基金
中国国家自然科学基金;
关键词
Character string recognition; Chinese handwriting recognition; Unsupervised language model adaptation; Language model compression; OFFLINE RECOGNITION; CHARACTER;
D O I
10.1016/j.patcog.2013.09.015
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents an effective approach for unsupervised language model adaptation (LMA) using multiple models in offline recognition of unconstrained handwritten Chinese texts. The domain of the document to recognize is variable and usually unknown a priori, so we use a two-pass recognition strategy with a pre-defined multi-domain language model set. We propose three methods to dynamically generate an adaptive language model to match the text output by first-pass recognition: model selection, model combination and model reconstruction. In model selection, we use the language model with minimum perplexity on the first-pass recognized text. By model combination, we learn the combination weights via minimizing the sum of squared error with both L2-norm and L1-norm regularization. For model reconstruction, we use a group of orthogonal bases to reconstruct a language model with the coefficients learned to match the document to recognize. Moreover, we reduce the storage size of multiple language models using two compression methods of split vector quantization (SVQ) and principal component analysis (PCA). Comprehensive experiments on two public Chinese handwriting databases CASIA-HWDB and HIT-MW show that the proposed unsupervised LMA approach improves the recognition performance impressively, particularly for ancient domain documents with the recognition accuracy improved by 7 percent. Meanwhile, the combination of the two compression methods largely reduces the storage size of language models with little loss of recognition accuracy. (C) 2013 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1202 / 1216
页数:15
相关论文
共 50 条
  • [21] Handwritten Chinese Text Recognition by Integrating Multiple Contexts
    Wang, Qiu-Feng
    Yin, Fei
    Liu, Cheng-Lin
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (08) : 1469 - 1481
  • [22] Improving Handwritten Chinese Text Recognition by Confidence Transformation
    Wang, Qiu-Feng
    Yin, Fei
    Liu, Cheng-Lin
    11TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2011), 2011, : 518 - 522
  • [23] Parsimonious HMMs for Offline Handwritten Chinese Text Recognition
    Wang, Wenchao
    Du, Jun
    Wang, Zi-Rui
    PROCEEDINGS 2018 16TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR), 2018, : 145 - 150
  • [24] Common Sense Knowledge for Handwritten Chinese Text Recognition
    Qiu-Feng Wang
    Erik Cambria
    Cheng-Lin Liu
    Amir Hussain
    Cognitive Computation, 2013, 5 : 234 - 242
  • [25] Learning confidence transformation for handwritten Chinese text recognition
    Da-Han Wang
    Cheng-Lin Liu
    International Journal on Document Analysis and Recognition (IJDAR), 2014, 17 : 205 - 219
  • [26] Unsupervised language model adaptation based on automatic text collection from WWW
    Suzuki, Motoyuki
    Kajiura, Yasutomo
    Ito, Akinori
    Makino, Shozo
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2202 - 2205
  • [27] An approach for handwritten Chinese text recognition unifying character segmentation and recognition
    Yu, Ming-Ming
    Zhang, Heng
    Yin, Fei
    Liu, Cheng-Lin
    PATTERN RECOGNITION, 2024, 151
  • [28] Unsupervised cross-adaptation approach for speech recognition by combined language model and acoustic model adaptation
    School of Science and Engineering, Yamagata University, Yonezawa, Japan
    APSIPA ASC - Asia-Pac. Signal Inf. Process. Assoc. Annu. Summit Conf., (943-946):
  • [29] Optimizing the integration of a statistical language model in HMM based offline handwritten text recognition
    Zimmermann, M
    Bunke, H
    PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 2, 2004, : 541 - 544
  • [30] Unsupervised class-based language model adaptation for spontaneous speech recognition
    Yokoyama, T
    Shinozaki, T
    Iwano, K
    Furui, S
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 236 - 239