Writer adaptation for E2E Arabic online handwriting recognition via adversarial multi task learning

被引:0
|
作者
Alwajih, Fakhraddin [1 ,2 ]
Badr, Eman [1 ,3 ]
Abdou, Sherif [1 ]
机构
[1] Cairo Univ, Dept Informat Technol, Giza, Egypt
[2] Ibb Univ, Dept Comp Sci & Informat Technol, Ibb, Yemen
[3] Univ Sci & Technol, Zewail City Sci Technol & Innovat, Giza, Egypt
关键词
Writer adaptation; Adversarial learning; Multi task learning; Arabic online handwriting recognition; Connectionist temporal classification; Convolutional neural networks; Bidirectional long short-term memory;
D O I
10.1016/j.eij.2022.02.007
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The importance of online handwriting recognition has been rapidly increasing over recent years due to the rapid technological advances in handheld devices and communication software with handwriting interfaces. Deep learning end-to-end (E2E) models have provided high recognition rates as part of online handwriting recognition systems. However, attaining even higher performance levels requires supple-menting these models with adaptation techniques that cater to individual penmanship. This study pro-poses a writer adaptation technique for Arabic online handwriting recognition systems that employs adversarial Multi-Task Learning (MTL). Adversarial training and MTL modify the deep-features distribu-tion of the Writer Dependent (WD) model, leading its output to closely resemble that of the Writer Independent (WI) model. The design of the proposed method entails two tasks: label classification (pri-mary task) and model features discrimination (secondary task). Our method was designed to jointly opti-mize both sub-networks. The proposed technique was tested against the E2E Connectionist Temporal Classification (CTC) based model, a combination of both Convolutional Neural Networks (CNNs) and Bidirectional Long Short-term Memory (BiLSTM). The proposed models were trained and evaluated against two large datasets (the Online-KHATT and CHAW). In supervised adaptation, it achieved an abso-lute Character Error Rate (CER) of up to 1.83% and an absolute Word Error Rate (WER) reduction of 11.71% over the WI model. Additionally, supervised adaptation achieved an absolute CER of up to 0.84% and an absolute WER reduction of 6.77% over the fine-tuned model. In unsupervised adaptation, the proposed method achieved an absolute CER of up to 0.5% absolute and an absolute WER reduction of 1.74% absolute (WER) reduction over the WI. Our experimental results indicate that our proposed supervised writer adaptation can achieve significant improvements in recognition accuracy compared with the baseline models: WI and fine-tuned models.(c) 2022 THE AUTHORS. Published by Elsevier BV on behalf of Faculty of Computers and Artificial Intel-ligence, Cairo University. This is an open access article under the CC BY-NC-ND license (http://creative-commons.org/licenses/by-nc-nd/4.0/).
引用
收藏
页码:373 / 382
页数:10
相关论文
共 15 条
  • [1] Writer adaptation via deeply learned features for online Chinese handwriting recognition
    Du, Jun
    Zhai, Jian-Fang
    Hu, Jin-Shui
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2017, 20 (01) : 69 - 78
  • [2] Writer adaptation via deeply learned features for online Chinese handwriting recognition
    Jun Du
    Jian-Fang Zhai
    Jin-Shui Hu
    International Journal on Document Analysis and Recognition (IJDAR), 2017, 20 : 69 - 78
  • [3] Rapid Language Adaptation for Multilingual E2E Speech Recognition Using Encoder Prompting
    Kashiwagi, Yosuke
    Futami, Hayato
    Tsunoo, Emiru
    Arora, Siddhant
    Watanabe, Shinji
    INTERSPEECH 2024, 2024, : 2900 - 2904
  • [4] Label-Synchronous Neural Transducer for Adaptable Online E2E Speech Recognition
    Deng, Keqi
    Woodland, Philip C.
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 3507 - 3516
  • [5] E2E-based Multi-task Learning Approach to Joint Speech and Accent Recognition
    Zhang, Jicheng
    Peng, Yizhou
    Pham, Van Tung
    Xu, Haihua
    Huang, Hao
    Chng, Eng Siong
    INTERSPEECH 2021, 2021, : 1519 - 1523
  • [6] Few-shot learning for E2E speech recognition: architectural variants for support set generation
    Eledath, Dhanya
    Thurlapati, Narasimha Rao
    Pavithra, V
    Banerjee, Tirthankar
    Ramasubramanian, V
    2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 444 - 448
  • [7] DIRECTIONAL ASR: A NEW PARADIGM FOR E2E MULTI-SPEAKER SPEECH RECOGNITION WITH SOURCE LOCALIZATION
    Subramanian, Aswin Shanmugam
    Weng, Chao
    Watanabe, Shinji
    Yu, Meng
    Xu, Yong
    Zhang, Shi-Xiong
    Yu, Dong
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 8433 - 8437
  • [8] Simultaneous Script Identification and Handwriting Recognition via Multi-Task Learning of Recurrent Neural Networks
    Chen, Zhuo
    Wu, Yichao
    Yin, Pei
    Liu, Cheng-Lin
    2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 525 - 530
  • [9] Hear No Evil: Towards Adversarial Robustness of Automatic Speech Recognition via Multi-Task Learning
    Das, Nilaksh
    Chau, Duen Horng
    INTERSPEECH 2022, 2022, : 3839 - 3843
  • [10] Few-shot Learning for Low-resource E2E ASR: Mono-, Cross- and Multi-lingual Scenarios
    Eledath, Dhanya
    Ramasubramanian, V
    2024 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS, SPCOM 2024, 2024,