Towards efficient unconstrained handwriting recognition using Dilated Temporal Convolution Network

被引:20
|
作者
Sharma A. [1 ]
Jayagopi D.B. [1 ]
机构
[1] Multimodal Perception Lab, International Institute of Information Technology - Bangalore (IIIT-B), Bangalore
关键词
Dilated Temporal Convolution Network; Document analysis; Handwriting recognition;
D O I
10.1016/j.eswa.2020.114004
中图分类号
学科分类号
摘要
Recognition of cursive handwritten images has advanced well with recent recurrent architectures and attention mechanism. Most of the works focus on improving transcription performance in terms of Character Error Rate (CER) and Word Error Rate (WER). Existing models are too slow to train and test networks. Furthermore, recent studies have recommended models be not only efficient in terms of task performance but also environmentally friendly in terms of model carbon footprint. Reviewing the recent state-of-the-art models, it recommends considering model training and retraining time while designing. High training time increases costs not only in terms of resources but also in carbon footprint. This becomes challenging for handwriting recognition model with popular recurrent architectures. It is truly critical since line images usually have a very long width resulting in a longer sequence to decode. In this work, we present a fully convolution based deep network architecture for cursive handwriting recognition from line level images. The architecture is a combination of 2-D convolutions and 1-D dilated non causal convolutions with Connectionist Temporal Classification (CTC) output layer. This offers a high parallelism with a smaller number of parameters. We further demonstrate experiments with various re-scaling factors of the images and how it affects the performance of the proposed model. A data augmentation pipeline is further analyzed while model training. The experiments show our model, has comparable performance on CER and WER measures with recurrent architectures. A comparison is done with state-of-the-art models with different architectures based on Recurrent Neural Networks (RNN) and its variants. The analysis shows training performance and network details of three different dataset of English and French handwriting. This shows our model has fewer parameters and takes less training and testing time, making it suitable for low-resource and environment-friendly deployment. © 2020
引用
收藏
相关论文
共 50 条
  • [41] Arabic handwriting recognition system using convolutional neural network
    Altwaijry, Najwa
    Al-Turaiki, Isra
    NEURAL COMPUTING & APPLICATIONS, 2021, 33 (07): : 2249 - 2261
  • [42] Parasitic egg recognition using convolution and attention network
    Nouar AlDahoul
    Hezerul Abdul Karim
    Mhd Adel Momo
    Francesca Isabelle F. Escobar
    Vina Alyzza Magallanes
    Myles Joshua Toledo Tan
    Scientific Reports, 13
  • [43] Towards Emotional Control Recognition through Handwriting Using Fuzzy Inference
    Mutalib, Sofianita
    Ramli, Roslina
    Rahman, Shuzlina Abdul
    Yusoff, Marina
    Mohamed, Azlinah
    INTERNATIONAL SYMPOSIUM OF INFORMATION TECHNOLOGY 2008, VOLS 1-4, PROCEEDINGS: COGNITIVE INFORMATICS: BRIDGING NATURAL AND ARTIFICIAL KNOWLEDGE, 2008, : 1336 - 1340
  • [44] Temporal dilated convolution and nonlinear autoregressive network for predicting solid oxide fuel cell performance
    Tofigh, Mohamadali
    Kharazmi, Ali
    Smith, Daniel J.
    Koch, Charles Robert
    Shahbakhti, Mahdi
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 136
  • [45] Skeleton-based action recognition with multi-stream, multi-scale dilated spatial-temporal graph convolution network
    Zhang, Haiping
    Liu, Xu
    Yu, Dongjin
    Guan, Liming
    Wang, Dongjing
    Ma, Conghao
    Hu, Zepeng
    APPLIED INTELLIGENCE, 2023, 53 (14) : 17629 - 17643
  • [46] Skeleton-based action recognition with multi-stream, multi-scale dilated spatial-temporal graph convolution network
    Haiping Zhang
    Xu Liu
    Dongjin Yu
    Liming Guan
    Dongjing Wang
    Conghao Ma
    Zepeng Hu
    Applied Intelligence, 2023, 53 : 17629 - 17643
  • [47] Decoding imagined speech from EEG signals using hybrid-scale spatial-temporal dilated convolution network
    Li, Fu
    Chao, Weibing
    Li, Yang
    Fu, Boxun
    Ji, Youshuo
    Wu, Hao
    Shi, Guangming
    JOURNAL OF NEURAL ENGINEERING, 2021, 18 (04)
  • [48] Towards an efficient backbone for preserving features in speech emotion recognition: deep-shallow convolution with recurrent neural network
    Goel, Dev Priya
    Mahajan, Kushagra
    Ngoc Duy Nguyen
    Srinivasan, Natesan
    Lim, Chee Peng
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (03): : 2457 - 2469
  • [49] Towards an efficient backbone for preserving features in speech emotion recognition: deep-shallow convolution with recurrent neural network
    Dev Priya Goel
    Kushagra Mahajan
    Ngoc Duy Nguyen
    Natesan Srinivasan
    Chee Peng Lim
    Neural Computing and Applications, 2023, 35 : 2457 - 2469
  • [50] Spatio-Temporal Sparse Graph Convolution Network for Hand Gesture Recognition
    Ikne, Omar
    Slama, Rim
    Saoudi, Hichem
    Wannous, Hazem
    2024 IEEE 18TH INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION, FG 2024, 2024,