Chinese dialect speech recognition: a comprehensive survey

被引:1
|
作者
Li, Qiang [1 ]
Mai, Qianyu [1 ]
Wang, Mandou [1 ]
Ma, Mingjuan [2 ]
机构
[1] North Minzu Univ, Sch Comp Sci & Engn, Yinchuan 750021, Peoples R China
[2] North Minzu Univ, Sch Econ, Yinchuan 750021, Peoples R China
基金
中国国家自然科学基金;
关键词
Chinese dialect; Dialect corpus; Dialectal acoustic modeling; Automatic speech recognition; Deep neural network; End-to-end; EMOTION RECOGNITION; MODELS; IDENTIFICATION; FEATURES; CORPUS;
D O I
10.1007/s10462-023-10668-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As a multi-ethnic country with a large population, China is endowed with diverse dialects, which brings considerable challenges to speech recognition work. In fact, due to geographical location, population migration, and other factors, the research progress and practical application of Chinese dialect speech recognition are currently at different stages. Therefore, exploring the significant regional heterogeneities in specific recognition approaches and effects, dialect corpus, and other resources is of vital importance for Chinese speech recognition work. Based on this, we first start with the regional classification of dialects and analyze the pivotal acoustic characteristics of dialects, including specific vowels and tones patterns. Secondly, we comprehensively summarize the existing dialect phonetic corpus in China, which is of some assistance in exploring the general construction methods of dialect phonetic corpus. Moreover, we expound on the general process of dialect recognition. Several critical dialect recognition approaches are summarized and introduced in detail, especially the hybrid method of Artificial Neural Network (ANN) combined with the Hidden Markov Model(HMM), as well as the End-to-End (E2E). Thirdly, through the in-depth comparison of their principles, merits, disadvantages, and recognition performance for different dialects, the development trends and challenges in dialect recognition in the future are pointed out. Finally, some application examples of dialect speech recognition are collected and discussed.
引用
收藏
页数:39
相关论文
共 50 条
  • [1] Chinese dialect speech recognition: a comprehensive survey
    Qiang Li
    Qianyu Mai
    Mandou Wang
    Mingjuan Ma
    Artificial Intelligence Review, 57
  • [2] Speech Emotion Recognition: A Comprehensive Survey
    Mohammed Jawad Al-Dujaili
    Abbas Ebrahimi-Moghadam
    Wireless Personal Communications, 2023, 129 : 2525 - 2561
  • [3] Speech Emotion Recognition: A Comprehensive Survey
    Al-Dujaili, Mohammed Jawad
    Ebrahimi-Moghadam, Abbas
    WIRELESS PERSONAL COMMUNICATIONS, 2023, 129 (04) : 2525 - 2561
  • [4] Trends in speech emotion recognition: a comprehensive survey
    Kaur, Kamaldeep
    Singh, Parminder
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (19) : 29307 - 29351
  • [5] Trends in speech emotion recognition: a comprehensive survey
    Kamaldeep Kaur
    Parminder Singh
    Multimedia Tools and Applications, 2023, 82 : 29307 - 29351
  • [6] Chinese Multi-Dialect Speech Recognition Based on Instruction Tuning
    Ding, Timin
    Sun, Kai
    Zhang, Xu
    Yu, Jian
    Huang, Degen
    FOURTH SYMPOSIUM ON PATTERN RECOGNITION AND APPLICATIONS, SPRA 2023, 2024, 13162
  • [7] Weighted Transformer for Dialect Speech Recognition
    Zhang, Minghan
    Xie, Fei
    Weng, Fuliang
    2022 IEEE INTERNATIONAL CONFERENCE ON KNOWLEDGE GRAPH (ICKG), 2022, : 381 - 385
  • [8] Speech Processing for Hindi Dialect Recognition
    Sinha, Shweta
    Jain, Aruna
    Agrawal, Shyam S.
    ADVANCES IN SIGNAL PROCESSING AND INTELLIGENT RECOGNITION SYSTEMS, 2014, 264 : 161 - 169
  • [9] Speech-in-speech recognition is modulated by familiarity to dialect
    Chin, Jessica L. L.
    Talevska, Elena
    Antoniou, Mark
    INTERSPEECH 2023, 2023, : 3113 - 3116
  • [10] Tibetan Multi-Dialect Speech and Dialect Identity Recognition
    Zhao, Yue
    Yue, Jianjian
    Song, Wei
    Xu, Xiaona
    Li, Xiali
    Wu, Licheng
    Ji, Qiang
    CMC-COMPUTERS MATERIALS & CONTINUA, 2019, 60 (03): : 1223 - 1235