COLOMBIAN DIALECT RECOGNITION BASED ON INFORMATION EXTRACTED FROM SPEECH AND TEXT SIGNALS

被引:4
|
作者
Escobar-Grisales, D. [1 ]
Rios-Urrego, C. D. [1 ]
Lopez-Santander, D. A. [1 ]
Gallo-Aristizabal, J. D. [1 ]
Vasquez-Correa, J. C. [1 ,2 ,3 ]
Noeth, E. [2 ]
Orozco-Arroyave, J. R. [1 ,2 ]
机构
[1] Univ Antioquia UdeA, Fac Engn, GITA Lab, Medellin, Colombia
[2] Friedrich Alexander Univ Erlangen Nurnberg, Pattern Recognit Lab, Nurnberg, Germany
[3] Pratech Grp, Medellin, Colombia
关键词
Dialect classification; Speech; Text; Customer Service; Acoustics; Language processing; LANGUAGE;
D O I
10.1109/ASRU51503.2021.9687890
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Dialect recognition is useful in many industrial sectors, particularly with the aim of allowing a better interaction between customers and providers. The core idea is to improve or customize marketing and customer service strategies, depending on the geographic location, birthplace and culture. This study proposes different models to automatically discriminate between two Colombian dialects: "Antioquefio" and "Bogotano", to the best of our knowledge this is the first work of Colombian dialect recognition based on real conversations from customer service centers. The proposed strategy consists of independent analyses, using information from speech recordings and their corresponding transliterations. On the one hand, classical approaches are used to model speech including prosody features, Mel frequency cepstral coefficients and the mean Hilbert envelope coefficients. For text models, Word2Vec and bidirectional encoding representations from transformer embeddings are considered. On the other hand, a deep learning approach is applied by considering convolutional neural networks, which are trained using spectrograms and embedding matrices for speech and text, respectively. The implemented deep learning models seem to be more promising than the classical ones for the addressed problem. Further experiments will be considered to validate this claim in a wider spectrum of methods.
引用
收藏
页码:556 / 563
页数:8
相关论文
共 50 条
  • [1] Speech Emotion Recognition Based on Henan Dialect
    Cheng, Zichen
    Li, Yan
    Jiu, Mengfei
    Ge, Jiangwei
    COMMUNICATIONS, SIGNAL PROCESSING, AND SYSTEMS, VOL. 1, 2022, 878 : 498 - 505
  • [2] FPGA based emotions recognition from speech signals
    Rajasekhar, B.
    Kamaraju, M.
    Sumalatha, V.
    2017 THIRD INTERNATIONAL CONFERENCE ON BIOSIGNALS, IMAGES AND INSTRUMENTATION (ICBSII), 2017,
  • [3] Constructing a Phonetic Transcribed Text Corpus for Southern Thai Dialect Speech Recognition
    Aunkaew, Sittichok
    Karnjanadecha, Montri
    Wutiwiwatchai, Chai
    PROCEEDINGS OF THE 2015 12TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER SCIENCE AND SOFTWARE ENGINEERING (JCSSE), 2015, : 69 - 73
  • [4] RECOGNITION OF SPEECH FROM SIGNALS SECONDARY TO SPEECH
    HARTZOG, S
    MORSE, MS
    TRULL, B
    ALEGRE, C
    HARRIS, P
    PROCEEDINGS OF THE ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, PTS 1-4, 1988, : 1188 - 1189
  • [5] Emotion Recognition from Spontaneous Tunisian Dialect Speech
    Nasr, Latifa Ibn
    Masmoudi, Abir
    Belguith, Lamia hadrich
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2025, 24 (02)
  • [6] Speech feature extracted from adaptive wavelet for speech recognition
    Chang, SW
    Kwon, Y
    Yang, SI
    ELECTRONICS LETTERS, 1998, 34 (23) : 2211 - 2213
  • [7] An Isarn Dialect HMM-based Text-to-speech System
    Janyoi, Pongsathon
    Seresangtakul, Pusadee
    2017 2ND INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY (INCIT), 2017, : 1 - 6
  • [8] Chinese semantic and phonological information-based text proofreading model for speech recognition
    Zhong M.
    Wu P.
    Dou Y.
    Liu Y.
    Kong L.
    Tongxin Xuebao/Journal on Communications, 2022, 43 (11): : 65 - 79
  • [9] Arabic Speech Emotion Recognition From Saudi Dialect Corpus
    Aljuhani, Reem Hamed
    Alshutayri, Areej
    Alahdal, Shahd
    IEEE ACCESS, 2021, 9 : 127081 - 127085
  • [10] Bimodal Emotion Recognition from Speech and Text
    Ye, Weilin
    Fan, Xinghua
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2014, 5 (02) : 26 - 29