Data Augmentation and Text Recognition on Khmer Historical Manuscripts

被引:6
|
作者
Valy, Dona [1 ]
Verleysen, Michel [2 ]
Chhun, Sophea [1 ]
机构
[1] Inst Technol Cambodia, Dept Informat & Commun Engn, Phnom Penh, Cambodia
[2] Catholic Univ Louvain, ICTEAM Inst, Ottignies, Belgium
关键词
historical document analysis; palm leaf manuscript; neural network; data augmentation; CHARACTER;
D O I
10.1109/ICFHR2020.2020.00024
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Analysis and recognition of historical documents faces many challenges, one of which is the scarcity of the ground truth data needed for most machine learning techniques, deep learning in particular. In this paper, we present a novel approach which significantly augments the word image samples generated from an existing dataset of Khmer ancient palm leaf manuscripts. Instead of segmenting real Khmer words, we combine the annotated glyphs into groups called sub-syllables. A new text recognition method is also proposed to take into account the spatially complex structure of Khmer writing. The proposed method is composed of two main modules: a feature generator and a decoder. The generator utilizes convolutional blocks, inception blocks, and also a bi-directional LSTM to encode information extracted from the input image so that it can be decoded by the attention-based decoder to predict the final text transcription. Experiments are conducted on a new dataset of groups of sub-syllables constructed from annotated glyphs of the SleukRith Set.
引用
收藏
页码:73 / 78
页数:6
相关论文
共 50 条
  • [1] Character and Text Recognition of Khmer Historical Palm Leaf Manuscripts
    Valy, Dona
    Verleysen, Michel
    Chhun, Sophea
    Burie, Jean-Christophe
    PROCEEDINGS 2018 16TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR), 2018, : 13 - 18
  • [2] Handwritten Khmer Text Recognition
    Annanurov, Bayram
    Noor, Norliza Mohd
    2016 IEEE INTERNATIONAL WIE CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (IEEE WIECON-ECE 2016), 2016, : 176 - 179
  • [3] Data Augmentation for Scene Text Recognition
    Atienza, Rowel
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 1561 - 1570
  • [4] Syllable Analysis Data Augmentation for Khmer Ancient Palm leaf Recognition
    Thuon, Nimol
    Du, Jun
    Zhang, Jianshu
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 1855 - 1862
  • [5] An Evaluation of Handwritten Text Recognition Methods for Historical Ciphered Manuscripts
    Souibgui, Mohamed Ali
    Torras, Pau
    Chen, Jialuo
    Fornes, Alicia
    PROCEEDINGS OF THE 2023 INTERNATIONAL WORKSHOP ON HISTORICAL DOCUMENT IMAGING AND PROCESSING, HIP 2023, 2023, : 7 - 12
  • [6] Feature Selection for Khmer Handwritten Text Recognition
    Annanurov, Bayram
    Noor, Norliza Mohd
    PROCEEDINGS OF THE 2017 IEEE RUSSIA SECTION YOUNG RESEARCHERS IN ELECTRICAL AND ELECTRONIC ENGINEERING CONFERENCE (2017 ELCONRUS), 2017, : 626 - 630
  • [7] Random Blur Data Augmentation for Scene Text Recognition
    Mu, Deguo
    Sun, Wei
    Xu, Guoliang
    Li, Wei
    IEEE ACCESS, 2021, 9 : 136636 - 136646
  • [8] Line Segmentation for Grayscale Text Images of Khmer Palm Leaf Manuscripts
    Valy, Dona
    Verleysen, Michel
    Sok, Kimheng
    PROCEEDINGS OF THE 2017 SEVENTH INTERNATIONAL CONFERENCE ON IMAGE PROCESSING THEORY, TOOLS AND APPLICATIONS (IPTA 2017), 2017,
  • [9] Printed Ottoman text recognition using synthetic data and data augmentation
    Esma F. Bilgin Tasdemir
    International Journal on Document Analysis and Recognition (IJDAR), 2023, 26 : 273 - 287
  • [10] Printed Ottoman text recognition using synthetic data and data augmentation
    Tasdemir, Esma F. Bilgin F.
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2023, 26 (03) : 273 - 287