Improving Handwritten Mathematical Expression Recognition via Integrating Convolutional Neural Network With Transformer and Diffusion-Based Data Augmentation

被引:0
|
作者
Zhang, Yibo [1 ]
Li, Gaoxu [2 ]
机构
[1] Beijing Jiaotong Univ, Sch Phys Sci & Engn, Beijing 100044, Peoples R China
[2] Xian Jiaotong Liverpool Univ, Sch Adv Technol, Suzhou 215123, Jiangsu, Peoples R China
来源
IEEE ACCESS | 2024年 / 12卷
基金
中国国家自然科学基金;
关键词
CNN; data augmentation; denoising diffusion probabilistic model; DDPM; handwritten mathematical expression recognition; HMER; Transformer;
D O I
10.1109/ACCESS.2024.3399919
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Handwritten mathematical expression recognition (HMER) poses a formidable challenge due to the intricate two-dimensional structures and diverse handwriting styles. This paper introduces a novel approach to improve HMER accuracy by employing an integrated, high-capacity architecture that combines Transformer and Convolutional Neural Network (CNN) models, along with a denoising diffusion probabilistic model (DDPM)-based data augmentation technique. We explore three combination strategies for an attention-based encoder-decoder (AED) HMER model: 1) The "Tandem" strategy, which harnesses CNN features within a Transformer encoder to capture global interdependencies; 2) The "Parallel" strategy, which integrates Transformer encoder outputs with CNN outputs to achieve comprehensive feature fusion; 3) The "Mixing" strategy, which introduces multi-head self-attention (MHSA) at the final stage of the CNN. We evaluate these methods using the CROHME benchmark dataset and conduct a detailed comparative analysis. All three approaches significantly enhance model performance. Notably, the "Tandem" approach achieves expression recognition rates (ExpRate) of 54.85% and 58.56% on the CROHME 2016 and 2019 test sets, respectively, while the "Parallel" method attains 55.63% and 57.39% on the same test sets. Furthermore, we introduce an innovative data augmentation approach that utilizes DDPM to generate synthetic training samples. The DDPM, conditioned on LaTeX-rendered images, bridges the gap between printed and handwritten expressions, enabling the creation of realistic, stylistically diverse handwriting samples. This augmentation boosts the ExpRates of all strategies on both CROHME 2016 and 2019 test sets, yielding improvements of 1.6-4.6% relative to the unaugmented dataset.
引用
收藏
页码:67945 / 67956
页数:12
相关论文
共 50 条
  • [31] An efficient and improved scheme for handwritten digit recognition based on convolutional neural network
    Saqib Ali
    Zeeshan Shaukat
    Muhammad Azeem
    Zareen Sakhawat
    Tariq Mahmood
    Khalil ur Rehman
    SN Applied Sciences, 2019, 1
  • [32] An efficient and improved scheme for handwritten digit recognition based on convolutional neural network
    Ali, Saqib
    Shaukat, Zeeshan
    Azeem, Muhammad
    Sakhawat, Zareen
    Mahmood, Tariq
    Rehman, Khalil Ur
    SN APPLIED SCIENCES, 2019, 1 (09):
  • [33] Adaptive Convolutional Neural Network Based Handwritten Tamil Character Recognition System
    Yogalakshmi, B.
    Ramya, S.
    Harini, M.
    Mahalakshmi, G.
    Anitha, K.
    Kartheeswaran, S.
    2024 5TH INTERNATIONAL CONFERENCE ON INNOVATIVE TRENDS IN INFORMATION TECHNOLOGY, ICITIIT 2024, 2024,
  • [34] The Four Arithmetic Operations for Handwritten Digit Recognition Based On Convolutional Neural Network
    Wang, Kecheng
    Deng, Junwen
    Xu, Linfeng
    Tang, Cong
    Pei, Zian
    Wang, Hongtao
    PROCEEDINGS OF THE 39TH CHINESE CONTROL CONFERENCE, 2020, : 7423 - 7428
  • [35] Adhesive Handwritten Digit Recognition Algorithm Based on Improved Convolutional Neural Network
    Tang, Junyi
    Han, Ping
    Liu, Dong
    PROCEEDINGS OF 2020 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND INFORMATION SYSTEMS (ICAIIS), 2020, : 388 - 392
  • [36] A novel handwritten Turkish letter recognition model based on convolutional neural network
    Kabakus, Abdullah Talha
    Erdogmus, Pakize
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021, 33 (21):
  • [37] MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data Augmentation
    Zhong, Yuan
    Cui, Suhan
    Wang, Jiaqi
    Wang, Xiaochen
    Yin, Ziyi
    Wang, Yaqing
    Xiao, Houping
    Huai, Mengdi
    Wang, Ting
    Ma, Fenglong
    PROCEEDINGS OF THE 2024 SIAM INTERNATIONAL CONFERENCE ON DATA MINING, SDM, 2024, : 499 - 507
  • [38] Facial Expression Recognition Based on Convolutional Neural Network
    Zhou Yue
    Feng Yanyan
    Zeng Shangyou
    Pan Bing
    PROCEEDINGS OF 2019 IEEE 10TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE (ICSESS 2019), 2019, : 410 - 413
  • [39] Face Expression Recognition Based on Convolutional Neural Network
    Xu, Lei
    Fei, Minrui
    Zhou, Wenju
    Yang, Aolei
    2018 AUSTRALIAN & NEW ZEALAND CONTROL CONFERENCE (ANZCC), 2018, : 115 - 118
  • [40] Analysis on transformer vibration signal recognition based on convolutional neural network
    Cai, Yonghua
    Hou, Aixia
    JOURNAL OF VIBROENGINEERING, 2021, 23 (02) : 484 - 495