Improving Handwritten Mathematical Expression Recognition via Integrating Convolutional Neural Network With Transformer and Diffusion-Based Data Augmentation

被引:0
|
作者
Zhang, Yibo [1 ]
Li, Gaoxu [2 ]
机构
[1] Beijing Jiaotong Univ, Sch Phys Sci & Engn, Beijing 100044, Peoples R China
[2] Xian Jiaotong Liverpool Univ, Sch Adv Technol, Suzhou 215123, Jiangsu, Peoples R China
来源
IEEE ACCESS | 2024年 / 12卷
基金
中国国家自然科学基金;
关键词
CNN; data augmentation; denoising diffusion probabilistic model; DDPM; handwritten mathematical expression recognition; HMER; Transformer;
D O I
10.1109/ACCESS.2024.3399919
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Handwritten mathematical expression recognition (HMER) poses a formidable challenge due to the intricate two-dimensional structures and diverse handwriting styles. This paper introduces a novel approach to improve HMER accuracy by employing an integrated, high-capacity architecture that combines Transformer and Convolutional Neural Network (CNN) models, along with a denoising diffusion probabilistic model (DDPM)-based data augmentation technique. We explore three combination strategies for an attention-based encoder-decoder (AED) HMER model: 1) The "Tandem" strategy, which harnesses CNN features within a Transformer encoder to capture global interdependencies; 2) The "Parallel" strategy, which integrates Transformer encoder outputs with CNN outputs to achieve comprehensive feature fusion; 3) The "Mixing" strategy, which introduces multi-head self-attention (MHSA) at the final stage of the CNN. We evaluate these methods using the CROHME benchmark dataset and conduct a detailed comparative analysis. All three approaches significantly enhance model performance. Notably, the "Tandem" approach achieves expression recognition rates (ExpRate) of 54.85% and 58.56% on the CROHME 2016 and 2019 test sets, respectively, while the "Parallel" method attains 55.63% and 57.39% on the same test sets. Furthermore, we introduce an innovative data augmentation approach that utilizes DDPM to generate synthetic training samples. The DDPM, conditioned on LaTeX-rendered images, bridges the gap between printed and handwritten expressions, enabling the creation of realistic, stylistically diverse handwriting samples. This augmentation boosts the ExpRates of all strategies on both CROHME 2016 and 2019 test sets, yielding improvements of 1.6-4.6% relative to the unaugmented dataset.
引用
收藏
页码:67945 / 67956
页数:12
相关论文
共 50 条
  • [41] Hand Gesture Recognition Using an Adapted Convolutional Neural Network with Data Augmentation
    Alani, Ali A.
    Cosma, Georgina
    Taherkhani, Aboozar
    McGinnity, T. M.
    2018 4TH INTERNATIONAL CONFERENCE ON INFORMATION MANAGEMENT (ICIM2018), 2018, : 5 - 12
  • [42] IMPLEMENTATION OF CONVOLUTIONAL NEURAL NETWORK FOR SUNDANESE SCRIPT HANDWRITING RECOGNITION WITH DATA AUGMENTATION
    Maliki, Irfan
    Prayoga, Ade Syahlan
    JOURNAL OF ENGINEERING SCIENCE AND TECHNOLOGY, 2023, 18 (02): : 1113 - 1123
  • [43] Static Hand Gesture Recognition using Convolutional Neural Network with Data Augmentation
    Islam, Md Zahirul
    Hossain, Mohammad Shahadat
    Ul Islam, Raihan
    Anderssor, Karl
    2019 JOINT 8TH INTERNATIONAL CONFERENCE ON INFORMATICS, ELECTRONICS & VISION (ICIEV) AND 2019 3RD INTERNATIONAL CONFERENCE ON IMAGING, VISION & PATTERN RECOGNITION (ICIVPR) WITH INTERNATIONAL CONFERENCE ON ACTIVITY AND BEHAVIOR COMPUTING (ABC), 2019, : 324 - 329
  • [44] Expression Recognition Method Based on Convolutional Neural Network and Capsule Neural Network
    Wang, Zhanfeng
    Yao, Lisha
    CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 79 (01): : 1659 - 1677
  • [45] Watch, attend and parse: An end-to-end neural network based approach to handwritten mathematical expression recognition
    Zhang, Jianshu
    Du, Jun
    Zhang, Shiliang
    Liu, Dan
    Hu, Yulong
    Hu, Jinshui
    Wei, Si
    Dai, Lirong
    PATTERN RECOGNITION, 2017, 71 : 196 - 206
  • [46] Handwritten Formula Symbol Recognition Based on Multi-Feature Convolutional Neural Network
    Fang Dingbang
    Feng Gui
    Cao Haiyan
    Yang Hengjie
    Han Xue
    Yi Yincheng
    LASER & OPTOELECTRONICS PROGRESS, 2019, 56 (07)
  • [47] Quantum Particle Swarm Optimization Based Convolutional Neural Network for Handwritten Script Recognition
    Sharma, Reya
    Kaushik, Baijnath
    Gondhi, Naveen Kumar
    Tahir, Muhammad
    Rahmani, Mohammad Khalid Imam
    CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 71 (03): : 5855 - 5873
  • [48] Data augmentation in EUV lithography simulation based on convolutional neural network
    Tanabe, Hiroyoshi
    Takahashi, Atsushi
    DTCO AND COMPUTATIONAL PATTERNING, 2022, 12052
  • [49] Effective Facial Expression Recognition via the Boosted Convolutional Neural Network
    Liu, Zhenhai
    Wang, Hanzi
    Yan, Yan
    Guo, Guanjun
    COMPUTER VISION, CCCV 2015, PT I, 2015, 546 : 179 - 188
  • [50] A New Approach to Data Annotation Automation for Online Handwritten Mathematical Expression Recognition based on Recurrent Neural Networks
    Zhelezniakov, Dmytro
    Cherneha, Anastasiia
    Zaytsev, Viktor
    Radyvonenko, Olga
    2021 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2021, : 1125 - 1132