Dialect-aware Semi-supervised Learning for End-to-End Multi-dialect Speech Recognition

被引：0

作者：

Shiota, Sayaka ^{[1
]}

Imaizumi, Ryo ^{[2
]}

Masumura, Ryo ^{[1
]}

Kiya, Hitoshi ^{[1
]}

机构：

[1] Tokyo Metropolitan Univ, Dept Comp Sci, Tokyo, Japan

[2] NTT Corp, NTT Comp & Data Sci Labs, Tokyo, Japan

来源：

PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC) | 2022年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we propose dialect-aware semi-supervised learning for end-to-end automatic speech recognition (ASR) models considering multi-dialect speech. Some multi-domain ASR tasks require a large amount of training data containing additional information (e.g., language or dialect), whereas it is difficult to prepare such data with accurate transcriptions. Semi-supervised learning is a method of using a massive amount of untranscribed data effectively, and it can be applied to multi-domain ASR tasks to relax the missing transcriptions problem. However, semi-supervised learning has usually used generated pseudo-transcriptions only. The problem is that simply combining a multi-domain model with semi-supervised learning makes use of no additional information even though the information can be obtained. Therefore, in this paper, we focus on semi-supervised learning based on a multi-domain model that takes additional domain information into account. Since the accuracy of pseudo-transcriptions can be improved by using the multi-domain model and additional information, our proposed semi-supervised learning is expected to provide a reliable ASR model. In experiments, we performed Japanese multi-dialect ASR as one type of multi-domain ASR. From the results, a model trained with the proposed method yielded the lowest character error rate compared with other models trained with the conventional semi-supervised method.

引用

页码：240 / 244

页数：5

共 50 条

[1] Dialect-Aware Modeling for End-to-End Japanese Dialect Speech Recognition
Imaizumi, Ryo
Masumura, Ryo
Shiota, Sayaka
Kiya, Hitoshi
2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 297 - 301
[2] End-to-end Japanese Multi-dialect Speech Recognition and Dialect Identification with Multi-task Learning
Imaizumi, Ryo
Masumura, Ryo
Shiota, Sayaka
Kiya, Hitoshi
APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2022, 11 (01)
[3] Semi-Supervised End-to-End Speech Recognition
Karita, Shigeki
Watanabe, Shinji
Iwata, Tomoharu
Ogawa, Atsunori
Delcroix, Marc
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2 - 6
[4] Tibetan Multi-Dialect Speech and Dialect Identity Recognition
Zhao, Yue
Yue, Jianjian
Song, Wei
Xu, Xiaona
Li, Xiali
Wu, Licheng
Ji, Qiang
CMC-COMPUTERS MATERIALS & CONTINUA, 2019, 60 (03): : 1223 - 1235
[5] Multi-Dialect Arabic Speech Recognition
Ali, Abbas Raza
2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
[6] Multi-Task End-to-End Model for Telugu Dialect and Speech Recognition
Yadavalli, Aditya
Mirishkar, Ganesh S.
Vuppala, Anil Kumar
INTERSPEECH 2022, 2022, : 1387 - 1391
[7] Improving End-to-End Bangla Speech Recognition with Semi-supervised Training
Sadeq, Nafis
Chowdhury, Nafis Tahmid
Utshaw, Farhan Tanvir
Ahmed, Shafayat
Adnan, Muhammad Abdullah
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 1875 - 1883
[8] End-to-End Rich Transcription-Style Automatic Speech Recognition with Semi-Supervised Learning
Tanaka, Tomohiro
Masumura, Ryo
Ihori, Mana
Takashima, Akihiko
Orihashi, Shota
Makishima, Naoki
INTERSPEECH 2021, 2021, : 4458 - 4462
[9] SEMI-SUPERVISED END-TO-END SPEECH RECOGNITION USING TEXT-TO-SPEECH AND AUTOENCODERS
Karita, Shigeki
Watanabe, Shinji
Iwata, Tomoharu
Delcroix, Marc
Ogawa, Atsunori
Nakatani, Tomohiro
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6166 - 6170
[10] Domain Expansion for End-to-End Speech Recognition: Applications for Accent/Dialect Speech
Ghorbani, Shahram
Hansen, John H. L.
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 762 - 774

← 1 2 3 4 5 →