Speech Corpus of Ainu Folklore and End-to-end Speech Recognition for Ainu Language

被引：0

作者：

Matsuura, Kohei ^{[1
]}

Ueno, Sei ^{[1
]}

Mimura, Masato ^{[1
]}

Sakai, Shinsuke ^{[1
]}

Kawahara, Tatsuya ^{[1
]}

机构：

[1] Kyoto Univ, Grad Sch Informat, Sakyo Ku, Kyoto 6068501, Japan

来源：

PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020) | 2020年

关键词：

Ainu speech corpus; low-resource language; end-to-end speech recognition; JAPANESE;

D O I：

暂无

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Ainu is an unwritten language that has been spoken by Ainu people who are one of the ethnic groups in Japan. It is recognized as critically endangered by UNESCO and archiving and documentation of its language heritage is of paramount importance. Although a considerable amount of voice recordings of Ainu folklore has been produced and accumulated to save their culture, only a quite limited parts of them are transcribed so far. Thus, we started a project of automatic speech recognition (ASR) for the Ainu language in order to contribute to the development of annotated language archives. In this paper, we report speech corpus development and the structure and performance of end-to-end ASR for Ainu. We investigated four modeling units (phone, syllable, word piece, and word) and found that the syllable-based model performed best in terms of both word and phone recognition accuracy, which were about 60% and over 85% respectively in speaker-open condition. Furthermore, word and phone accuracy of 80% and 90% has been achieved in a speaker-closed setting. We also found out that a multilingual ASR training with additional speech corpora of English and Japanese further improves the speaker-open test accuracy.

引用

页码：2622 / 2628

页数：7

共 50 条

[31] Performance Monitoring for End-to-End Speech Recognition
Li, Ruizhi
Sell, Gregory
Hermansky, Hynek
INTERSPEECH 2019, 2019, : 2245 - 2249
[32] TOWARDS END-TO-END UNSUPERVISED SPEECH RECOGNITION
Liu, Alexander H.
Hsu, Wei-Ning
Auli, Michael
Baevski, Alexei
2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 221 - 228
[33] TRIGGERED ATTENTION FOR END-TO-END SPEECH RECOGNITION
Moritz, Niko
Hori, Takaaki
Le Roux, Jonathan
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5666 - 5670
[34] An Overview of End-to-End Automatic Speech Recognition
Wang, Dong
Wang, Xiaodong
Lv, Shaohe
SYMMETRY-BASEL, 2019, 11 (08):
[35] End-to-End Speech Recognition For Arabic Dialects
Nasr, Seham
Duwairi, Rehab
Quwaider, Muhannad
ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2023, 48 (08) : 10617 - 10633
[36] End-to-End Speech Recognition in Agglutinative Languages
Mamyrbayev, Orken
Alimhan, Keylan
Zhumazhanov, Bagashar
Turdalykyzy, Tolganay
Gusmanova, Farida
INTELLIGENT INFORMATION AND DATABASE SYSTEMS (ACIIDS 2020), PT II, 2020, 12034 : 391 - 401
[37] End-to-end Korean Digits Speech Recognition
Roh, Jong-hyuk
Cho, Kwantae
Kim, Youngsam
Cho, Sangrae
2019 10TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY CONVERGENCE (ICTC): ICT CONVERGENCE LEADING THE AUTONOMOUS FUTURE, 2019, : 1137 - 1139
[38] End-to-end audio-visual speech recognition for overlapping speech
Rose, Richard
Siohan, Olivier
Tripathi, Anshuman
Braga, Otavio
INTERSPEECH 2021, 2021, : 3016 - 3020
[39] Deep Speech 2: End-to-End Speech Recognition in English and Mandarin
Amodei, Dario
Ananthanarayanan, Sundaram
Anubhai, Rishita
Bai, Jingliang
Battenberg, Eric
Case, Carl
Casper, Jared
Catanzaro, Bryan
Cheng, Qiang
Chen, Guoliang
Chen, Jie
Chen, Jingdong
Chen, Zhijie
Chrzanowski, Mike
Coates, Adam
Diamos, Greg
Ding, Ke
Du, Niandong
Elsen, Erich
Engel, Jesse
Fang, Weiwei
Fan, Linxi
Fougner, Christopher
Gao, Liang
Gong, Caixia
Hannun, Awni
Han, Tony
Johannes, Lappi Vaino
Jiang, Bing
Ju, Cai
Jun, Billy
LeGresley, Patrick
Lin, Libby
Liu, Junjie
Liu, Yang
Li, Weigao
Li, Xiangang
Ma, Dongpeng
Narang, Sharan
Ng, Andrew
Ozair, Sherjil
Peng, Yiping
Prenger, Ryan
Qian, Sheng
Quan, Zongfeng
Raiman, Jonathan
Rao, Vinay
Satheesh, Sanjeev
Seetapun, David
Sengupta, Shubho
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
[40] EXTENDING PARROTRON: AN END-TO-END, SPEECH CONVERSION AND SPEECH RECOGNITION MODEL FOR ATYPICAL SPEECH
Doshi, Rohan
Chen, Youzheng
Jiang, Liyang
Zhang, Xia
Biadsy, Fadi
Ramabhadran, Bhuvana
Chu, Fang
Rosenberg, Andrew
Moreno, Pedro J.
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6988 - 6992

← 1 2 3 4 5 →