Writer adaptation for E2E Arabic online handwriting recognition via adversarial multi task learning

被引：0

作者：

Alwajih, Fakhraddin ^{[1
,2
]}

Badr, Eman ^{[1
,3
]}

Abdou, Sherif ^{[1
]}

机构：

[1] Cairo Univ, Dept Informat Technol, Giza, Egypt

[2] Ibb Univ, Dept Comp Sci & Informat Technol, Ibb, Yemen

[3] Univ Sci & Technol, Zewail City Sci Technol & Innovat, Giza, Egypt

来源：

EGYPTIAN INFORMATICS JOURNAL | 2022年 / 23卷 / 03期

关键词：

Writer adaptation; Adversarial learning; Multi task learning; Arabic online handwriting recognition; Connectionist temporal classification; Convolutional neural networks; Bidirectional long short-term memory;

D O I：

10.1016/j.eij.2022.02.007

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The importance of online handwriting recognition has been rapidly increasing over recent years due to the rapid technological advances in handheld devices and communication software with handwriting interfaces. Deep learning end-to-end (E2E) models have provided high recognition rates as part of online handwriting recognition systems. However, attaining even higher performance levels requires supple-menting these models with adaptation techniques that cater to individual penmanship. This study pro-poses a writer adaptation technique for Arabic online handwriting recognition systems that employs adversarial Multi-Task Learning (MTL). Adversarial training and MTL modify the deep-features distribu-tion of the Writer Dependent (WD) model, leading its output to closely resemble that of the Writer Independent (WI) model. The design of the proposed method entails two tasks: label classification (pri-mary task) and model features discrimination (secondary task). Our method was designed to jointly opti-mize both sub-networks. The proposed technique was tested against the E2E Connectionist Temporal Classification (CTC) based model, a combination of both Convolutional Neural Networks (CNNs) and Bidirectional Long Short-term Memory (BiLSTM). The proposed models were trained and evaluated against two large datasets (the Online-KHATT and CHAW). In supervised adaptation, it achieved an abso-lute Character Error Rate (CER) of up to 1.83% and an absolute Word Error Rate (WER) reduction of 11.71% over the WI model. Additionally, supervised adaptation achieved an absolute CER of up to 0.84% and an absolute WER reduction of 6.77% over the fine-tuned model. In unsupervised adaptation, the proposed method achieved an absolute CER of up to 0.5% absolute and an absolute WER reduction of 1.74% absolute (WER) reduction over the WI. Our experimental results indicate that our proposed supervised writer adaptation can achieve significant improvements in recognition accuracy compared with the baseline models: WI and fine-tuned models.(c) 2022 THE AUTHORS. Published by Elsevier BV on behalf of Faculty of Computers and Artificial Intel-ligence, Cairo University. This is an open access article under the CC BY-NC-ND license (http://creative-commons.org/licenses/by-nc-nd/4.0/).

引用

页码：373 / 382

页数：10

共 15 条

[1] Writer adaptation via deeply learned features for online Chinese handwriting recognition
Du, Jun
Zhai, Jian-Fang
Hu, Jin-Shui
INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2017, 20 (01) : 69 - 78
[2] Writer adaptation via deeply learned features for online Chinese handwriting recognition
Jun Du
Jian-Fang Zhai
Jin-Shui Hu
International Journal on Document Analysis and Recognition (IJDAR), 2017, 20 : 69 - 78
[3] Rapid Language Adaptation for Multilingual E2E Speech Recognition Using Encoder Prompting
Kashiwagi, Yosuke
Futami, Hayato
Tsunoo, Emiru
Arora, Siddhant
Watanabe, Shinji
INTERSPEECH 2024, 2024, : 2900 - 2904
[4] Label-Synchronous Neural Transducer for Adaptable Online E2E Speech Recognition
Deng, Keqi
Woodland, Philip C.
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 3507 - 3516
[5] E2E-based Multi-task Learning Approach to Joint Speech and Accent Recognition
Zhang, Jicheng
Peng, Yizhou
Pham, Van Tung
Xu, Haihua
Huang, Hao
Chng, Eng Siong
INTERSPEECH 2021, 2021, : 1519 - 1523
[6] Few-shot learning for E2E speech recognition: architectural variants for support set generation
Eledath, Dhanya
Thurlapati, Narasimha Rao
Pavithra, V
Banerjee, Tirthankar
Ramasubramanian, V
2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 444 - 448
[7] DIRECTIONAL ASR: A NEW PARADIGM FOR E2E MULTI-SPEAKER SPEECH RECOGNITION WITH SOURCE LOCALIZATION
Subramanian, Aswin Shanmugam
Weng, Chao
Watanabe, Shinji
Yu, Meng
Xu, Yong
Zhang, Shi-Xiong
Yu, Dong
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 8433 - 8437
[8] Simultaneous Script Identification and Handwriting Recognition via Multi-Task Learning of Recurrent Neural Networks
Chen, Zhuo
Wu, Yichao
Yin, Pei
Liu, Cheng-Lin
2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 525 - 530
[9] Hear No Evil: Towards Adversarial Robustness of Automatic Speech Recognition via Multi-Task Learning
Das, Nilaksh
Chau, Duen Horng
INTERSPEECH 2022, 2022, : 3839 - 3843
[10] Few-shot Learning for Low-resource E2E ASR: Mono-, Cross- and Multi-lingual Scenarios
Eledath, Dhanya
Ramasubramanian, V
2024 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS, SPCOM 2024, 2024,

← 1 2 →