Towards efficient unconstrained handwriting recognition using Dilated Temporal Convolution Network

被引:20
|
作者
Sharma A. [1 ]
Jayagopi D.B. [1 ]
机构
[1] Multimodal Perception Lab, International Institute of Information Technology - Bangalore (IIIT-B), Bangalore
关键词
Dilated Temporal Convolution Network; Document analysis; Handwriting recognition;
D O I
10.1016/j.eswa.2020.114004
中图分类号
学科分类号
摘要
Recognition of cursive handwritten images has advanced well with recent recurrent architectures and attention mechanism. Most of the works focus on improving transcription performance in terms of Character Error Rate (CER) and Word Error Rate (WER). Existing models are too slow to train and test networks. Furthermore, recent studies have recommended models be not only efficient in terms of task performance but also environmentally friendly in terms of model carbon footprint. Reviewing the recent state-of-the-art models, it recommends considering model training and retraining time while designing. High training time increases costs not only in terms of resources but also in carbon footprint. This becomes challenging for handwriting recognition model with popular recurrent architectures. It is truly critical since line images usually have a very long width resulting in a longer sequence to decode. In this work, we present a fully convolution based deep network architecture for cursive handwriting recognition from line level images. The architecture is a combination of 2-D convolutions and 1-D dilated non causal convolutions with Connectionist Temporal Classification (CTC) output layer. This offers a high parallelism with a smaller number of parameters. We further demonstrate experiments with various re-scaling factors of the images and how it affects the performance of the proposed model. A data augmentation pipeline is further analyzed while model training. The experiments show our model, has comparable performance on CER and WER measures with recurrent architectures. A comparison is done with state-of-the-art models with different architectures based on Recurrent Neural Networks (RNN) and its variants. The analysis shows training performance and network details of three different dataset of English and French handwriting. This shows our model has fewer parameters and takes less training and testing time, making it suitable for low-resource and environment-friendly deployment. © 2020
引用
收藏
相关论文
共 50 条
  • [1] Unconstrained face recognition using deep convolution neural network
    Agrawal A.K.
    Singh Y.N.
    International Journal of Information and Computer Security, 2020, 12 (2-3) : 332 - 348
  • [2] Action Recognition Using High Temporal Resolution 3D Neural Network Based on Dilated Convolution
    Xu, Yongyang
    Feng, Yaxing
    Xie, Zhong
    Xie, Mingyu
    Luo, Wei
    IEEE ACCESS, 2020, 8 : 165365 - 165372
  • [3] Efficient Decomposition Convolution and Temporal Pyramid Network for Video Face Recognition
    Zhou S.-T.
    Yan X.
    Xie Z.-S.
    Dianzi Keji Daxue Xuebao/Journal of the University of Electronic Science and Technology of China, 2021, 50 (02): : 231 - 235
  • [4] Speech Emotion Recognition using Context-Aware Dilated Convolution Network
    Kakuba, Samuel
    Han, Dong Seog
    2022 27TH ASIA PACIFIC CONFERENCE ON COMMUNICATIONS (APCC 2022): CREATING INNOVATIVE COMMUNICATION TECHNOLOGIES FOR POST-PANDEMIC ERA, 2022, : 601 - 604
  • [5] Writing in the Air: Unconstrained Text Recognition from Finger Movement Using Spatio-Temporal Convolution
    Kim U.-H.
    Hwang Y.
    Lee S.-K.
    Kim J.-H.
    IEEE Transactions on Artificial Intelligence, 2023, 4 (06): : 1386 - 1398
  • [6] A multi-dilated convolution network for speech emotion recognition
    Madanian, Samaneh
    Adeleye, Olayinka
    Templeton, John Michael
    Chen, Talen
    Poellabauer, Christian
    Zhang, Enshi
    Schneider, Sandra L.
    SCIENTIFIC REPORTS, 2025, 15 (01):
  • [7] Face Recognition System in Unconstrained Environment through Convolution Neural Network
    Agrawal, Amrit Kumar
    Singh, Yogendra Narain
    2018 FIFTH INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND GRID COMPUTING (IEEE PDGC), 2018, : 506 - 511
  • [8] Diacritical processing for unconstrained online handwriting recognition using a forward search
    Seni G.
    Seybold J.
    International Journal on Document Analysis and Recognition, 1999, 2 (1) : 24 - 29
  • [9] A Novel Multichannel Dilated Convolution Neural Network for Human Activity Recognition
    Lin, Yingjie
    Wu, Jianning
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2020, 2020 (2020)
  • [10] Cross Stage Partial Dilated Convolution Network for License Plate Recognition
    Wang, Qingwang
    Song, Haochen
    Liu, Zhiyi
    Tao, Zhimin
    Shen, Tao
    INTERNATIONAL JOURNAL OF CONTROL AUTOMATION AND SYSTEMS, 2024, 22 (06) : 2029 - 2037