Unsupervised language model adaptation for handwritten Chinese text recognition

被引:14
|
作者
Wang, Qiu-Feng [1 ]
Yin, Fei [1 ]
Liu, Cheng-Lin [1 ]
机构
[1] Chinese Acad Sci, Inst Automat, NLPR, Beijing 100190, Peoples R China
基金
中国国家自然科学基金;
关键词
Character string recognition; Chinese handwriting recognition; Unsupervised language model adaptation; Language model compression; OFFLINE RECOGNITION; CHARACTER;
D O I
10.1016/j.patcog.2013.09.015
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents an effective approach for unsupervised language model adaptation (LMA) using multiple models in offline recognition of unconstrained handwritten Chinese texts. The domain of the document to recognize is variable and usually unknown a priori, so we use a two-pass recognition strategy with a pre-defined multi-domain language model set. We propose three methods to dynamically generate an adaptive language model to match the text output by first-pass recognition: model selection, model combination and model reconstruction. In model selection, we use the language model with minimum perplexity on the first-pass recognized text. By model combination, we learn the combination weights via minimizing the sum of squared error with both L2-norm and L1-norm regularization. For model reconstruction, we use a group of orthogonal bases to reconstruct a language model with the coefficients learned to match the document to recognize. Moreover, we reduce the storage size of multiple language models using two compression methods of split vector quantization (SVQ) and principal component analysis (PCA). Comprehensive experiments on two public Chinese handwriting databases CASIA-HWDB and HIT-MW show that the proposed unsupervised LMA approach improves the recognition performance impressively, particularly for ancient domain documents with the recognition accuracy improved by 7 percent. Meanwhile, the combination of the two compression methods largely reduces the storage size of language models with little loss of recognition accuracy. (C) 2013 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1202 / 1216
页数:15
相关论文
共 50 条
  • [41] On the influence of vocabulary size and language models in unconstrained handwritten text recognition
    Marti, UV
    Bunke, H
    SIXTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, PROCEEDINGS, 2001, : 260 - 265
  • [42] N-gram language models for offline handwritten text recognition
    Zimmermann, M
    Bunke, H
    NINTH INTERNATIONAL WORKSHOP ON FRONTIERS IN HANDWRITING RECOGNITION, PROCEEDINGS, 2004, : 203 - 208
  • [43] Sub-Structure Learning Based Handwritten Chinese Text Recognition
    Zhu, Yuanping
    Sun, Jun
    Naoi, Satoshi
    2013 12TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2013, : 295 - 299
  • [44] Unsupervised language model adaptation for broadcast news
    Chen, LZ
    Gauvain, JL
    Lamel, L
    Adda, G
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 220 - 223
  • [45] Deep Knowledge Training and Heterogeneous CNN for Handwritten Chinese Text Recognition
    Wang, Song
    Chen, Li
    Xu, Liang
    Fan, Wei
    Sun, Jun
    Naoi, Satoshi
    PROCEEDINGS OF 2016 15TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR), 2016, : 84 - 89
  • [46] Towards Fast, Accurate and Compact Online Handwritten Chinese Text Recognition
    Peng, Dezhi
    Xie, Canyu
    Li, Hongliang
    Jin, Lianwen
    Xie, Zecheng
    Ding, Kai
    Huang, Yichao
    Wu, Yaqiang
    DOCUMENT ANALYSIS AND RECOGNITION, ICDAR 2021, PT III, 2021, 12823 : 157 - 171
  • [47] Handwritten Khmer Text Recognition
    Annanurov, Bayram
    Noor, Norliza Mohd
    2016 IEEE INTERNATIONAL WIE CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (IEEE WIECON-ECE 2016), 2016, : 176 - 179
  • [48] Recognition of Handwritten Mathematical Text
    Chajri, Yassine
    Bouikhalene, Belaid
    INTERNATIONAL JOURNAL OF FUTURE GENERATION COMMUNICATION AND NETWORKING, 2016, 9 (08): : 307 - 316
  • [49] ALGORITHM FOR RECOGNITION OF HANDWRITTEN TEXT
    GUBERMAN, SA
    ROZENTSVEIG, VV
    AUTOMATION AND REMOTE CONTROL, 1976, 37 (05) : 751 - 757
  • [50] Handwritten Text Recognition for Bengali
    Andreu Sanchez, Joan
    Pal, Umapada
    PROCEEDINGS OF 2016 15TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR), 2016, : 542 - 547