A New Approach to Data Annotation Automation for Online Handwritten Mathematical Expression Recognition based on Recurrent Neural Networks

被引:1
|
作者
Zhelezniakov, Dmytro [1 ,2 ]
Cherneha, Anastasiia [1 ,2 ]
Zaytsev, Viktor [1 ]
Radyvonenko, Olga [1 ]
机构
[1] Samsung R&D Inst, 57 Lva Tolstogo Str, Kiev, Ukraine
[2] Taras Shevchenko Natl Univ Kyiv, Kiev, Ukraine
关键词
GENERATION;
D O I
10.1109/SMC52423.2021.9658867
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The modern recognition methods based on deep learning have established high requirements for the size of training data. However, such data is not always publicly available, often undersized, or limited by the number of classes. Preparing ground truth data is very expensive, time-consuming, and error-prone during collecting as well as annotation for many applications, particularly for optical character recognition and handwriting recognition. In many applications, such as recognition of 2-dimensional languages (diagrams, charts, mathematical formulas), annotation is further complicated by the fact that in addition to the large number of symbol classes that vary depending on the application, the spatial relations between symbols or classes must also be annotated. In this work, we propose an approach for automatic annotation of online handwritten mathematical expressions. This iterative approach provides a hierarchical annotation using an LSTM-based recognition model and a small annotated dataset as a starting point and provides an increase in the alphabet, gradually improving the recognition accuracy of new classes of symbols. The proposed approach does not imply prior verification of the gathered dataset and comprises three main stages: training recognition models, automatic annotation using recognition and matching algorithms, and automatic verification. These stages are repeated until the number of new automatically recognized and annotated samples becomes small enough. Samples that have not passed automatic verification are suspicious and require manual verification or refining, which is done at the last stage. In our experiment, more than 85% of the samples were automatically annotated. The annotation accuracy at the symbol level is more than 99%. Experimental results demonstrated that the proposed approach provided time-saving of up to 90% on manual operations. The proposed approach can also be applied to high-noise datasets.
引用
收藏
页码:1125 / 1132
页数:8
相关论文
共 50 条
  • [31] Recognition of Online Handwritten Math Symbols Using Deep Neural Networks
    Hai Dai Nguyen
    Anh Duc Le
    Nakagawa, Masaki
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2016, E99D (12) : 3110 - 3118
  • [32] Recognition of online handwritten Gurmukhi characters using recurrent neural network classifier
    Harjeet Singh
    R. K. Sharma
    V. P. Singh
    Munish Kumar
    Soft Computing, 2021, 25 : 6329 - 6338
  • [33] Recognition of online handwritten Gurmukhi characters using recurrent neural network classifier
    Singh, Harjeet
    Sharma, R. K.
    Singh, V. P.
    Kumar, Munish
    SOFT COMPUTING, 2021, 25 (08) : 6329 - 6338
  • [34] Handwritten character recognition based on hybrid neural networks
    Wang, P
    Sun, GM
    Zhang, XM
    NEURAL NETWORK AND DISTRIBUTED PROCESSING, 2001, 4555 : 65 - 70
  • [35] An Improved Segmentation of Online English Handwritten Text Using Recurrent Neural Networks
    Cuong Tuan Nguyen
    Nakagawa, Masaki
    PROCEEDINGS 3RD IAPR ASIAN CONFERENCE ON PATTERN RECOGNITION ACPR 2015, 2015, : 176 - 180
  • [36] Relation-Based Representation for Handwritten Mathematical Expression Recognition
    Thanh-Nghia Truong
    Huy Quang Ung
    Hung Tuan Nguyen
    Cuong Tuan Nguyen
    Nakagawa, Masaki
    DOCUMENT ANALYSIS AND RECOGNITION, ICDAR 2021 WORKSHOPS, PT I, 2021, 12916 : 7 - 19
  • [37] Improving Handwritten Mathematical Expression Recognition via Integrating Convolutional Neural Network With Transformer and Diffusion-Based Data Augmentation
    Zhang, Yibo
    Li, Gaoxu
    IEEE ACCESS, 2024, 12 : 67945 - 67956
  • [38] Similar handwritten Chinese character recognition based on deep neural networks with big data
    School of Electronic and Information Engineering, South China University of Technology, Guangzhou
    510641, China
    Tongxin Xuebao, 9 (184-189):
  • [39] On the improvement of handwritten text line recognition with octave convolutional recurrent neural networks
    Castro, Dayvid
    Zanchettin, Cleber
    Amaral, Luis A. Nunes
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2024, 27 (04) : 567 - 581
  • [40] Neural networks based image recognition: A new approach
    Yang, Jiyun
    Liao, Xiaofeng
    Deng, Shaojiang
    Yu, Miao
    Zheng, Hongying
    ADVANCES IN NEURAL NETWORKS - ISNN 2007, PT 2, PROCEEDINGS, 2007, 4492 : 724 - +