A New Approach to Data Annotation Automation for Online Handwritten Mathematical Expression Recognition based on Recurrent Neural Networks

被引:1
|
作者
Zhelezniakov, Dmytro [1 ,2 ]
Cherneha, Anastasiia [1 ,2 ]
Zaytsev, Viktor [1 ]
Radyvonenko, Olga [1 ]
机构
[1] Samsung R&D Inst, 57 Lva Tolstogo Str, Kiev, Ukraine
[2] Taras Shevchenko Natl Univ Kyiv, Kiev, Ukraine
来源
2021 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC) | 2021年
关键词
GENERATION;
D O I
10.1109/SMC52423.2021.9658867
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The modern recognition methods based on deep learning have established high requirements for the size of training data. However, such data is not always publicly available, often undersized, or limited by the number of classes. Preparing ground truth data is very expensive, time-consuming, and error-prone during collecting as well as annotation for many applications, particularly for optical character recognition and handwriting recognition. In many applications, such as recognition of 2-dimensional languages (diagrams, charts, mathematical formulas), annotation is further complicated by the fact that in addition to the large number of symbol classes that vary depending on the application, the spatial relations between symbols or classes must also be annotated. In this work, we propose an approach for automatic annotation of online handwritten mathematical expressions. This iterative approach provides a hierarchical annotation using an LSTM-based recognition model and a small annotated dataset as a starting point and provides an increase in the alphabet, gradually improving the recognition accuracy of new classes of symbols. The proposed approach does not imply prior verification of the gathered dataset and comprises three main stages: training recognition models, automatic annotation using recognition and matching algorithms, and automatic verification. These stages are repeated until the number of new automatically recognized and annotated samples becomes small enough. Samples that have not passed automatic verification are suspicious and require manual verification or refining, which is done at the last stage. In our experiment, more than 85% of the samples were automatically annotated. The annotation accuracy at the symbol level is more than 99%. Experimental results demonstrated that the proposed approach provided time-saving of up to 90% on manual operations. The proposed approach can also be applied to high-noise datasets.
引用
收藏
页码:1125 / 1132
页数:8
相关论文
共 50 条
  • [41] Tree-based data augmentation and mutual learning for offline handwritten mathematical expression recognition
    Yang, Chen
    Du, Jun
    Zhang, Jianshu
    Wu, Changjie
    Chen, Mingjun
    Wu, JiaJia
    PATTERN RECOGNITION, 2022, 132
  • [42] A New Handwritten Number Recognition Approach Using Typical Testors, Genetic Algorithms, and Neural Networks
    Torres-Constante, Eddy
    Ibarra-Fiallo, Julio
    Intriago-Pazmino, Monserrate
    SMART TECHNOLOGIES, SYSTEMS AND APPLICATIONS, SMARTTECH-IC 2021, 2022, 1532 : 291 - 305
  • [43] R-GRU: Regularized gated recurrent unit for handwritten mathematical expression recognition
    Aniket Pal
    Krishna Pratap Singh
    Multimedia Tools and Applications, 2022, 81 : 31405 - 31419
  • [44] Writer Adaptive Feature Extraction Based on Convolutional Neural Networks For Online Handwritten Chinese Character Recognition
    Du, Jun
    Zhai, Jian-Fang
    Hu, Jin-Shui
    Zhu, Bo
    Wei, Si
    Dai, Li-Rong
    2015 13TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2015, : 841 - 845
  • [45] R-GRU: Regularized gated recurrent unit for handwritten mathematical expression recognition
    Pal, Aniket
    Singh, Krishna Pratap
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (22) : 31405 - 31419
  • [46] Arabic Handwritten Character Recognition Based on Convolution Neural Networks
    Research Laboratory in Algebra, Numbers Theory and Intelligent Systems RLANTIS, Monastir University, Monastir, Tunisia
    不详
    不详
    94140, France
    Commun. Comput. Info. Sci., 2022, (286-293):
  • [47] Handwritten English Word Recognition based on Convolutional Neural Networks
    Yuan, Aiquan
    Bai, Gang
    Yang, Po
    Guo, Yanni
    Zhao, Xinting
    13TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2012), 2012, : 207 - 212
  • [48] Arabic Handwritten Character Recognition Based on Convolution Neural Networks
    Bouchriha, Lamia
    Zrigui, Ahmed
    Mansouri, Sadek
    Berchech, Salma
    Omrani, Syrine
    ADVANCES IN COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2022, 2022, 1653 : 286 - 293
  • [49] Graph-to-Graph: Towards Accurate and Interpretable Online Handwritten Mathematical Expression Recognition
    Wu, Jin-Wen
    Yin, Fei
    Zhang, Yan-Ming
    Zhang, Xu-Yao
    Liu, Cheng-Lin
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 2925 - 2933
  • [50] Online Handwriting Mongolia Words Recognition with Recurrent Neural Networks
    Wu Wei
    Gao Guanglai
    ICCIT: 2009 FOURTH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCES AND CONVERGENCE INFORMATION TECHNOLOGY, VOLS 1 AND 2, 2009, : 165 - 167