Watch, attend and parse: An end-to-end neural network based approach to handwritten mathematical expression recognition

被引:135
|
作者
Zhang, Jianshu [1 ]
Du, Jun [1 ]
Zhang, Shiliang [1 ]
Liu, Dan [2 ]
Hu, Yulong [2 ]
Hu, Jinshui [2 ]
Wei, Si [2 ]
Dai, Lirong [1 ]
机构
[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei, Anhui, Peoples R China
[2] IFLYTEK Res, Hefei, Anhui, Peoples R China
基金
中国国家自然科学基金;
关键词
Handwritten mathematical expression; recognition; Neural network; Attention; FEATURES;
D O I
10.1016/j.patcog.2017.06.017
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Machine recognition of a handwritten mathematical expression (HME) is challenging due to the ambiguities of handwritten symbols and the two-dimensional structure of mathematical expressions. Inspired by recent work in deep learning, we present Watch, Attend and Parse (WAP), a novel end-to-end approach based on neural network that learns to recognize HMEs in a two-dimensional layout and outputs them as one-dimensional character sequences in LaTeX format. Inherently unlike traditional methods, our proposed model avoids problems that stem from symbol segmentation, and it does not require a predefined expression grammar. Meanwhile, the problems of symbol recognition and structural analysis are handled, respectively, using a watcher and a parser. We employ a convolutional neural network encoder that takes HME images as input as the watcher and employ a recurrent neural network decoder equipped with an attention mechanism as the parser to generate LaTeX sequences. Moreover, the correspondence between the input expressions and the output LaTeX sequences is learned automatically by the attention mechanism. We validate the proposed approach on a benchmark published by the CROHME international competition. Using the official training dataset, WAP significantly outperformed the state-of-the-art method with an expression recognition accuracy of 46.55% on CROHME 2014 and 44.55% on CROHME 2016. (C) 2017 Elsevier Ltd. All rights reserved.
引用
收藏
页码:196 / 206
页数:11
相关论文
共 50 条
  • [1] Track, Attend, and Parse (TAP): An End-to-End Framework for Online Handwritten Mathematical Expression Recognition
    Zhang, Jianshu
    Du, Jun
    Dai, Lirong
    IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (01) : 221 - 233
  • [2] Training an End-to-End System for Handwritten Mathematical Expression Recognition by Generated Patterns
    Anh Duc Le
    Nakagawa, Masaki
    2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 1056 - 1061
  • [3] Scan, Attend and Read: End-to-End Handwritten Paragraph Recognition with MDLSTM Attention
    Bluche, Theodore
    Louradour, Jerome
    Messina, Ronaldo
    2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 1050 - 1055
  • [4] Improvement of End-to-End Offline Handwritten Mathematical Expression Recognition by Weakly Supervised Learning
    Thanh-Nghia Truong
    Cuong Tuan Nguyen
    Khanh Minh Phan
    Nakagawa, Masaki
    2020 17TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2020), 2020, : 181 - 186
  • [5] Combining CNN and Transformer as Encoder to Improve End-to-End Handwritten Mathematical Expression Recognition Accuracy
    Zhang, Zhang
    Zhang, Yibo
    FRONTIERS IN HANDWRITING RECOGNITION, ICFHR 2022, 2022, 13639 : 185 - 197
  • [6] End-to-End Speech Emotion Recognition Based on Neural Network
    Zhu, Bing
    Zhou, Wenkai
    Wang, Yutian
    Wang, Hui
    Cai, Juan Juan
    2017 17TH IEEE INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY (ICCT 2017), 2017, : 1634 - 1638
  • [7] An End-to-End Approach for Recognition of Modern and Historical Handwritten Numeral Strings
    Hochuli, Andre G.
    Britto, Alceu S., Jr.
    Barddal, Jean P.
    Oliveira, Luiz E. S.
    Sabourin, Robert
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [8] On usage of an end-to-end deep neural architecture for handwritten digit string recognition
    Omidi, Zahra
    BabaAli, Bagher
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (04) : 3009 - 3020
  • [9] On usage of an end-to-end deep neural architecture for handwritten digit string recognition
    Zahra Omidi
    Bagher BabaAli
    Signal, Image and Video Processing, 2024, 18 : 3009 - 3020
  • [10] Joint Recognition of Handwritten Text and Named Entities with a Neural End-to-end Model
    Carbonell, Manuel
    Villegas, Mauricio
    Fornes, Alicia
    Llados, Josep
    2018 13TH IAPR INTERNATIONAL WORKSHOP ON DOCUMENT ANALYSIS SYSTEMS (DAS), 2018, : 399 - 404