Chinese dialect speech recognition: a comprehensive survey

被引:1
|
作者
Li, Qiang [1 ]
Mai, Qianyu [1 ]
Wang, Mandou [1 ]
Ma, Mingjuan [2 ]
机构
[1] North Minzu Univ, Sch Comp Sci & Engn, Yinchuan 750021, Peoples R China
[2] North Minzu Univ, Sch Econ, Yinchuan 750021, Peoples R China
基金
中国国家自然科学基金;
关键词
Chinese dialect; Dialect corpus; Dialectal acoustic modeling; Automatic speech recognition; Deep neural network; End-to-end; EMOTION RECOGNITION; MODELS; IDENTIFICATION; FEATURES; CORPUS;
D O I
10.1007/s10462-023-10668-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As a multi-ethnic country with a large population, China is endowed with diverse dialects, which brings considerable challenges to speech recognition work. In fact, due to geographical location, population migration, and other factors, the research progress and practical application of Chinese dialect speech recognition are currently at different stages. Therefore, exploring the significant regional heterogeneities in specific recognition approaches and effects, dialect corpus, and other resources is of vital importance for Chinese speech recognition work. Based on this, we first start with the regional classification of dialects and analyze the pivotal acoustic characteristics of dialects, including specific vowels and tones patterns. Secondly, we comprehensively summarize the existing dialect phonetic corpus in China, which is of some assistance in exploring the general construction methods of dialect phonetic corpus. Moreover, we expound on the general process of dialect recognition. Several critical dialect recognition approaches are summarized and introduced in detail, especially the hybrid method of Artificial Neural Network (ANN) combined with the Hidden Markov Model(HMM), as well as the End-to-End (E2E). Thirdly, through the in-depth comparison of their principles, merits, disadvantages, and recognition performance for different dialects, the development trends and challenges in dialect recognition in the future are pointed out. Finally, some application examples of dialect speech recognition are collected and discussed.
引用
收藏
页数:39
相关论文
共 50 条
  • [21] Sichuan dialect speech recognition with deep LSTM network
    Ying, Wangyang
    Zhang, Lei
    Deng, Hongli
    FRONTIERS OF COMPUTER SCIENCE, 2020, 14 (02) : 378 - 387
  • [22] Machine Learning Paradigms for Speech Recognition of an Indian Dialect
    Londhe, N. D.
    Ahirwal, M. K.
    Lodha, P.
    2016 INTERNATIONAL CONFERENCE ON COMMUNICATION AND SIGNAL PROCESSING (ICCSP), VOL. 1, 2016, : 780 - 786
  • [23] Sichuan dialect speech recognition with deep LSTM network
    Wangyang Ying
    Lei Zhang
    Hongli Deng
    Frontiers of Computer Science, 2020, 14 : 378 - 387
  • [24] Emotion Recognition from Spontaneous Tunisian Dialect Speech
    Nasr, Latifa Ibn
    Masmoudi, Abir
    Belguith, Lamia hadrich
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2025, 24 (02)
  • [25] An Annotated Speech Corpus of Rare Dialect for Recognition-Take Dali Dialect as an Example
    Huang, Tian
    Yang, Dongqi
    Qin, Wanyun
    Zhang, Shubo
    Li, Binyang
    Li, Yan
    COGNITIVE COMPUTING, ICCC 2021, 2022, 12992 : 3 - 13
  • [26] Dialect-Aware Modeling for End-to-End Japanese Dialect Speech Recognition
    Imaizumi, Ryo
    Masumura, Ryo
    Shiota, Sayaka
    Kiya, Hitoshi
    2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 297 - 301
  • [27] Modelling the Tongue Movement of Chinese Shaanxi Xi'an Dialect Speech
    Lu, Zhao
    Czap, Laszlo
    2018 19TH INTERNATIONAL CARPATHIAN CONTROL CONFERENCE (ICCC), 2018, : 98 - 103
  • [28] MinSpeech: A Corpus of Southern Min Dialect for Automatic Speech Recognition
    Lin, Jiayan
    Lu, Shenghui
    Huang, Hukai
    Guan, Wenhao
    Xu, Binbin
    Bu, Hui
    Hong, Qingyang
    Li, Lin
    INTERSPEECH 2024, 2024, : 2330 - 2334
  • [29] Learning Aided Mood and Dialect Recognition using Telephonic Speech
    Sharma, Mridusmita
    Sarma, Kandarpa Kumar
    2016 INTERNATIONAL CONFERENCE ON ACCESSIBILITY TO DIGITAL WORLD (ICADW), 2016, : 163 - 167
  • [30] Two-stage Training for Chinese Dialect Recognition
    Ren, Zongze
    Yang, Guofu
    Xu, Shugong
    INTERSPEECH 2019, 2019, : 4050 - 4054