Improving Handwritten Mathematical Expression Recognition via Integrating Convolutional Neural Network With Transformer and Diffusion-Based Data Augmentation

被引:0
|
作者
Zhang, Yibo [1 ]
Li, Gaoxu [2 ]
机构
[1] Beijing Jiaotong Univ, Sch Phys Sci & Engn, Beijing 100044, Peoples R China
[2] Xian Jiaotong Liverpool Univ, Sch Adv Technol, Suzhou 215123, Jiangsu, Peoples R China
来源
IEEE ACCESS | 2024年 / 12卷
基金
中国国家自然科学基金;
关键词
CNN; data augmentation; denoising diffusion probabilistic model; DDPM; handwritten mathematical expression recognition; HMER; Transformer;
D O I
10.1109/ACCESS.2024.3399919
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Handwritten mathematical expression recognition (HMER) poses a formidable challenge due to the intricate two-dimensional structures and diverse handwriting styles. This paper introduces a novel approach to improve HMER accuracy by employing an integrated, high-capacity architecture that combines Transformer and Convolutional Neural Network (CNN) models, along with a denoising diffusion probabilistic model (DDPM)-based data augmentation technique. We explore three combination strategies for an attention-based encoder-decoder (AED) HMER model: 1) The "Tandem" strategy, which harnesses CNN features within a Transformer encoder to capture global interdependencies; 2) The "Parallel" strategy, which integrates Transformer encoder outputs with CNN outputs to achieve comprehensive feature fusion; 3) The "Mixing" strategy, which introduces multi-head self-attention (MHSA) at the final stage of the CNN. We evaluate these methods using the CROHME benchmark dataset and conduct a detailed comparative analysis. All three approaches significantly enhance model performance. Notably, the "Tandem" approach achieves expression recognition rates (ExpRate) of 54.85% and 58.56% on the CROHME 2016 and 2019 test sets, respectively, while the "Parallel" method attains 55.63% and 57.39% on the same test sets. Furthermore, we introduce an innovative data augmentation approach that utilizes DDPM to generate synthetic training samples. The DDPM, conditioned on LaTeX-rendered images, bridges the gap between printed and handwritten expressions, enabling the creation of realistic, stylistically diverse handwriting samples. This augmentation boosts the ExpRates of all strategies on both CROHME 2016 and 2019 test sets, yielding improvements of 1.6-4.6% relative to the unaugmented dataset.
引用
收藏
页码:67945 / 67956
页数:12
相关论文
共 50 条
  • [1] Handwritten Mathematical Expression Recognition Using Convolutional Neural Network
    Giang-Son Tran
    Chi-Kien Huynh
    Thanh-Sach Le
    Tan-Phuc Phan
    Khanh-Ngoc Bui
    2018 3RD INTERNATIONAL CONFERENCE ON CONTROL, ROBOTICS AND CYBERNETICS (CRC), 2018, : 15 - 19
  • [2] Bangla Handwritten Character Recognition using Convolutional Neural Network with Data Augmentation
    Chowdhury, Rumman Rashid
    Hossain, Mohammad Shahadat
    Ul Islam, Raihan
    Andersson, Karl
    Hossain, Sazzad
    2019 JOINT 8TH INTERNATIONAL CONFERENCE ON INFORMATICS, ELECTRONICS & VISION (ICIEV) AND 2019 3RD INTERNATIONAL CONFERENCE ON IMAGING, VISION & PATTERN RECOGNITION (ICIVPR) WITH INTERNATIONAL CONFERENCE ON ACTIVITY AND BEHAVIOR COMPUTING (ABC), 2019, : 318 - 323
  • [3] Handwritten Mathematical Expression Recognition: An approach on data augmentation
    Khanh-Ngoc Bui
    Quoc-Kim-Hoang Nguyen
    Thanh-Sach Le
    2021 15TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING AND APPLICATIONS (ACOMP 2021), 2021, : 46 - 53
  • [4] Diffusion-Based Convolutional Recurrent Neural Network for Improving Sound Event Detection
    Al Dabel, Maryam M.
    PROCEEDINGS OF NINTH INTERNATIONAL CONGRESS ON INFORMATION AND COMMUNICATION TECHNOLOGY, VOL 8, ICICT 2024, 2024, 1004 : 173 - 183
  • [5] Facial Expression Recognition using Convolutional Neural Network with Data Augmentation
    Ahmed, Tawsin Uddin
    Hossain, Sazzad
    Hossain, Mohammad Shahadat
    Ul Islam, Raihan
    Andersson, Karl
    2019 JOINT 8TH INTERNATIONAL CONFERENCE ON INFORMATICS, ELECTRONICS & VISION (ICIEV) AND 2019 3RD INTERNATIONAL CONFERENCE ON IMAGING, VISION & PATTERN RECOGNITION (ICIVPR) WITH INTERNATIONAL CONFERENCE ON ACTIVITY AND BEHAVIOR COMPUTING (ABC), 2019, : 336 - 341
  • [6] Handwritten Digit Recognition Based on Convolutional Neural Network
    Zhang, Chao
    Zhou, Zhiyao
    Lin, Lan
    2020 CHINESE AUTOMATION CONGRESS (CAC 2020), 2020, : 7384 - 7388
  • [8] Improving Attention-Based Handwritten Mathematical Expression Recognition with Scale Augmentation and Drop Attention
    Li, Zhe
    Jin, Lianwen
    Lai, Songxuan
    Zhu, Yecheng
    2020 17TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2020), 2020, : 175 - 180
  • [9] Data augmentation and directional feature maps extraction for in-air handwritten Chinese character recognition based on convolutional neural network
    Qu, Xiwen
    Wang, Weiqiang
    Lu, Ke
    Zhou, Jianshe
    PATTERN RECOGNITION LETTERS, 2018, 111 : 9 - 15
  • [10] Human Activity Recognition Based on Multichannel Convolutional Neural Network With Data Augmentation
    Shi, Wenbing
    Fang, Xianjin
    Yang, Gaoming
    Huang, Ji
    IEEE ACCESS, 2022, 10 : 76596 - 76606