A New Approach to Data Annotation Automation for Online Handwritten Mathematical Expression Recognition based on Recurrent Neural Networks

被引:1
|
作者
Zhelezniakov, Dmytro [1 ,2 ]
Cherneha, Anastasiia [1 ,2 ]
Zaytsev, Viktor [1 ]
Radyvonenko, Olga [1 ]
机构
[1] Samsung R&D Inst, 57 Lva Tolstogo Str, Kiev, Ukraine
[2] Taras Shevchenko Natl Univ Kyiv, Kiev, Ukraine
关键词
GENERATION;
D O I
10.1109/SMC52423.2021.9658867
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The modern recognition methods based on deep learning have established high requirements for the size of training data. However, such data is not always publicly available, often undersized, or limited by the number of classes. Preparing ground truth data is very expensive, time-consuming, and error-prone during collecting as well as annotation for many applications, particularly for optical character recognition and handwriting recognition. In many applications, such as recognition of 2-dimensional languages (diagrams, charts, mathematical formulas), annotation is further complicated by the fact that in addition to the large number of symbol classes that vary depending on the application, the spatial relations between symbols or classes must also be annotated. In this work, we propose an approach for automatic annotation of online handwritten mathematical expressions. This iterative approach provides a hierarchical annotation using an LSTM-based recognition model and a small annotated dataset as a starting point and provides an increase in the alphabet, gradually improving the recognition accuracy of new classes of symbols. The proposed approach does not imply prior verification of the gathered dataset and comprises three main stages: training recognition models, automatic annotation using recognition and matching algorithms, and automatic verification. These stages are repeated until the number of new automatically recognized and annotated samples becomes small enough. Samples that have not passed automatic verification are suspicious and require manual verification or refining, which is done at the last stage. In our experiment, more than 85% of the samples were automatically annotated. The annotation accuracy at the symbol level is more than 99%. Experimental results demonstrated that the proposed approach provided time-saving of up to 90% on manual operations. The proposed approach can also be applied to high-noise datasets.
引用
收藏
页码:1125 / 1132
页数:8
相关论文
共 50 条
  • [1] Online handwritten mathematical expression recognition
    Buyukbayrak, Hakan
    Yanikoglu, Berrin
    Ercil, Aytul
    DOCUMENT RECOGNITION AND RETRIEVAL XIV, 2007, 6500
  • [2] Online handwritten mathematical expression recognition
    Buyukbayrak, Hakan
    Yanikoglu, Berrin
    Ercil, Aytul
    2006 IEEE 14TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS, VOLS 1 AND 2, 2006, : 730 - +
  • [3] Handwritten Mathematical Expression Recognition: An approach on data augmentation
    Khanh-Ngoc Bui
    Quoc-Kim-Hoang Nguyen
    Thanh-Sach Le
    2021 15TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING AND APPLICATIONS (ACOMP 2021), 2021, : 46 - 53
  • [4] A global learning approach for an online handwritten mathematical expression recognition system
    Awal, Ahmad-Montaser
    Mouchere, Harold
    Viard-Gaudin, Christian
    PATTERN RECOGNITION LETTERS, 2014, 35 : 68 - 77
  • [5] Stroke Based Posterior Attention for Online Handwritten Mathematical Expression Recognition
    Wu, Changjie
    Wang, Qing
    Zhang, Jianshu
    Du, Jun
    Wang, Jiaming
    Wu, Jiajia
    Hu, Jinshui
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 2943 - 2949
  • [6] Unconstrained online handwritten Uyghur word recognition based on recurrent neural networks and connectionist temporal classification
    Ibrayim, Mayire
    Simayi, Wujiahematiti
    Hamdulla, Askar
    INTERNATIONAL JOURNAL OF BIOMETRICS, 2021, 13 (01) : 51 - 63
  • [7] Online Handwritten Mathematical Expression Recognition and Applications: A Survey
    Zhelezniakov, Dmytro
    Zaytsev, Viktor
    Radyvonenko, Olga
    IEEE ACCESS, 2021, 9 : 38352 - 38373
  • [8] A GRU-based Encoder-Decoder Approach with Attention for Online Handwritten Mathematical Expression Recognition
    Zhang, Jianshu
    Du, Jun
    Dai, Lirong
    2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 902 - 907
  • [9] SRD: A Tree Structure Based Decoder for Online Handwritten Mathematical Expression Recognition
    Zhang, Jianshu
    Du, Jun
    Yang, Yongxin
    Song, Yi-Zhe
    Dai, Lirong
    IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 2471 - 2480
  • [10] A Neural Network Model for Online Handwritten Mathematical Symbol Recognition
    Thammano, Arit
    Rugkunchon, Sukhumal
    INTELLIGENT COMPUTING, PART I: INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING, ICIC 2006, PART I, 2006, 4113 : 292 - 298