Improving Handwritten Mathematical Expression Recognition via Integrating Convolutional Neural Network With Transformer and Diffusion-Based Data Augmentation

被引:0
|
作者
Zhang, Yibo [1 ]
Li, Gaoxu [2 ]
机构
[1] Beijing Jiaotong Univ, Sch Phys Sci & Engn, Beijing 100044, Peoples R China
[2] Xian Jiaotong Liverpool Univ, Sch Adv Technol, Suzhou 215123, Jiangsu, Peoples R China
来源
IEEE ACCESS | 2024年 / 12卷
基金
中国国家自然科学基金;
关键词
CNN; data augmentation; denoising diffusion probabilistic model; DDPM; handwritten mathematical expression recognition; HMER; Transformer;
D O I
10.1109/ACCESS.2024.3399919
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Handwritten mathematical expression recognition (HMER) poses a formidable challenge due to the intricate two-dimensional structures and diverse handwriting styles. This paper introduces a novel approach to improve HMER accuracy by employing an integrated, high-capacity architecture that combines Transformer and Convolutional Neural Network (CNN) models, along with a denoising diffusion probabilistic model (DDPM)-based data augmentation technique. We explore three combination strategies for an attention-based encoder-decoder (AED) HMER model: 1) The "Tandem" strategy, which harnesses CNN features within a Transformer encoder to capture global interdependencies; 2) The "Parallel" strategy, which integrates Transformer encoder outputs with CNN outputs to achieve comprehensive feature fusion; 3) The "Mixing" strategy, which introduces multi-head self-attention (MHSA) at the final stage of the CNN. We evaluate these methods using the CROHME benchmark dataset and conduct a detailed comparative analysis. All three approaches significantly enhance model performance. Notably, the "Tandem" approach achieves expression recognition rates (ExpRate) of 54.85% and 58.56% on the CROHME 2016 and 2019 test sets, respectively, while the "Parallel" method attains 55.63% and 57.39% on the same test sets. Furthermore, we introduce an innovative data augmentation approach that utilizes DDPM to generate synthetic training samples. The DDPM, conditioned on LaTeX-rendered images, bridges the gap between printed and handwritten expressions, enabling the creation of realistic, stylistically diverse handwriting samples. This augmentation boosts the ExpRates of all strategies on both CROHME 2016 and 2019 test sets, yielding improvements of 1.6-4.6% relative to the unaugmented dataset.
引用
收藏
页码:67945 / 67956
页数:12
相关论文
共 50 条
  • [21] Handwritten Character Recognition Model Based on Discriminant Convolutional Neural Network
    Qu, Xiwen
    Wu, Xiang
    Hu, Mianjun
    Huang, Jun
    Computer Engineering and Applications, 2023, 59 (22) : 151 - 157
  • [22] Symbol Location-Aware Network for Improving Handwritten Mathematical Expression Recognition
    Fu, Yingnan
    Cai, Wenyuan
    Gao, Ming
    Zhou, Aoying
    PROCEEDINGS OF THE 2023 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2023, 2023, : 516 - 524
  • [23] CoMER: Modeling Coverage for Transformer-Based Handwritten Mathematical Expression Recognition
    Zhao, Wenqi
    Gao, Liangcai
    COMPUTER VISION - ECCV 2022, PT XXVIII, 2022, 13688 : 392 - 408
  • [24] A Transformer-based Syntax Tree Decoder for Handwritten Mathematical Expression Recognition
    Zhou B.
    Cao J.
    Wang Y.
    Beijing Daxue Xuebao (Ziran Kexue Ban)/Acta Scientiarum Naturalium Universitatis Pekinensis, 2023, 59 (06): : 909 - 914
  • [25] Deblending of seismic data in the wavelet domain via a convolutional neural network based on data augmentation
    Wang, Shaowen
    Song, Peng
    Tan, Jun
    Xia, Dongming
    Du, Guoning
    Wang, Qianqian
    GEOPHYSICAL PROSPECTING, 2024, 72 (01) : 213 - 228
  • [26] Handwritten Numeral Recognition Integrating Start-End Points Measure with Convolutional Neural Network
    Akhand, M. A. H.
    Rahat-Uz-Zaman, Md.
    Hye, Shadmaan
    Kamal, Md Abdus Samad
    ELECTRONICS, 2023, 12 (02)
  • [27] Diffusion-Based Causality-Preserving Neural Network for Dementia Recognition
    Mamoon, Saqib
    Xia, Zhengwang
    Alfakih, Amani
    Lu, Jianfeng
    INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, 2025, 35 (01)
  • [28] A new data augmentation convolutional neural network for human emotion recognition based on ECG signals
    Nita, Sihem
    Bitam, Salim
    Heidet, Matthieu
    Mellouk, Abdelhamid
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2022, 75
  • [29] A new data augmentation convolutional neural network for human emotion recognition based on ECG signals
    Nita, Sihem
    Bitam, Salim
    Heidet, Matthieu
    Mellouk, Abdelhamid
    Biomedical Signal Processing and Control, 2022, 75
  • [30] Improving handwritten Chinese text recognition using neural network language models and convolutional neural network shape models
    Wu, Yi-Chao
    Yin, Fei
    Liu, Cheng-Lin
    PATTERN RECOGNITION, 2017, 65 : 251 - 264