Watch, attend and parse: An end-to-end neural network based approach to handwritten mathematical expression recognition

被引:135
|
作者
Zhang, Jianshu [1 ]
Du, Jun [1 ]
Zhang, Shiliang [1 ]
Liu, Dan [2 ]
Hu, Yulong [2 ]
Hu, Jinshui [2 ]
Wei, Si [2 ]
Dai, Lirong [1 ]
机构
[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei, Anhui, Peoples R China
[2] IFLYTEK Res, Hefei, Anhui, Peoples R China
基金
中国国家自然科学基金;
关键词
Handwritten mathematical expression; recognition; Neural network; Attention; FEATURES;
D O I
10.1016/j.patcog.2017.06.017
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Machine recognition of a handwritten mathematical expression (HME) is challenging due to the ambiguities of handwritten symbols and the two-dimensional structure of mathematical expressions. Inspired by recent work in deep learning, we present Watch, Attend and Parse (WAP), a novel end-to-end approach based on neural network that learns to recognize HMEs in a two-dimensional layout and outputs them as one-dimensional character sequences in LaTeX format. Inherently unlike traditional methods, our proposed model avoids problems that stem from symbol segmentation, and it does not require a predefined expression grammar. Meanwhile, the problems of symbol recognition and structural analysis are handled, respectively, using a watcher and a parser. We employ a convolutional neural network encoder that takes HME images as input as the watcher and employ a recurrent neural network decoder equipped with an attention mechanism as the parser to generate LaTeX sequences. Moreover, the correspondence between the input expressions and the output LaTeX sequences is learned automatically by the attention mechanism. We validate the proposed approach on a benchmark published by the CROHME international competition. Using the official training dataset, WAP significantly outperformed the state-of-the-art method with an expression recognition accuracy of 46.55% on CROHME 2014 and 44.55% on CROHME 2016. (C) 2017 Elsevier Ltd. All rights reserved.
引用
收藏
页码:196 / 206
页数:11
相关论文
共 50 条
  • [41] Hypothesis Correction Based on Semi-character Recurrent Neural Network for End-to-end Speech Recognition
    Tachioka, Yuuki
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 862 - 867
  • [42] An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition
    Shi, Baoguang
    Bai, Xiang
    Yao, Cong
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (11) : 2298 - 2304
  • [43] Insights on Neural Representations for End-to-End Speech Recognition
    Ollerenshaw, Anna
    Jalal, Asif
    Hain, Thomas
    INTERSPEECH 2021, 2021, : 4079 - 4083
  • [44] End-to-End Radar HRRP Target Recognition Based on Integrated Denoising and Recognition Network
    Liu, Xiaodan
    Wang, Li
    Bai, Xueru
    REMOTE SENSING, 2022, 14 (20)
  • [45] EXPLORING NEURAL TRANSDUCERS FOR END-TO-END SPEECH RECOGNITION
    Battenberg, Eric
    Chen, Jitong
    Child, Rewon
    Coates, Adam
    Gaur, Yashesh
    Li, Yi
    Liu, Hairong
    Satheesh, Sanjeev
    Sriram, Anuroop
    Zhu, Zhenyao
    2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 206 - 213
  • [46] End-to-End Neural Segmental Models for Speech Recognition
    Tang, Hao
    Lu, Liang
    Kong, Lingpeng
    Gimpel, Kevin
    Livescu, Karen
    Dyer, Chris
    Smith, Noah A.
    Renals, Steve
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2017, 11 (08) : 1254 - 1264
  • [47] End-to-End Text Recognition with Convolutional Neural Networks
    Wang, Tao
    Wu, David J.
    Coates, Adam
    Ng, Andrew Y.
    2012 21ST INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR 2012), 2012, : 3304 - 3308
  • [48] Attention-based neural network for end-to-end music separation
    Wang, Jing
    Liu, Hanyue
    Ying, Haorong
    Qiu, Chuhan
    Li, Jingxin
    Anwar, Muhammad Shahid
    CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2023, 8 (02) : 355 - 363
  • [49] END-TO-END ROAD GRAPH EXTRACTION BASED ON GRAPH NEURAL NETWORK
    Yang, Chengkai
    Todoran, Ion-George
    Saravia, Christian
    IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 4887 - 4890
  • [50] Lane detection in complex scenes based on end-to-end neural network
    Liu, Wenbo
    Yan, Fei
    Tang, Kuan
    Zhang, Jiyong
    Deng, Tao
    2020 CHINESE AUTOMATION CONGRESS (CAC 2020), 2020, : 4300 - 4305