COLOMBIAN DIALECT RECOGNITION BASED ON INFORMATION EXTRACTED FROM SPEECH AND TEXT SIGNALS

被引:4
|
作者
Escobar-Grisales, D. [1 ]
Rios-Urrego, C. D. [1 ]
Lopez-Santander, D. A. [1 ]
Gallo-Aristizabal, J. D. [1 ]
Vasquez-Correa, J. C. [1 ,2 ,3 ]
Noeth, E. [2 ]
Orozco-Arroyave, J. R. [1 ,2 ]
机构
[1] Univ Antioquia UdeA, Fac Engn, GITA Lab, Medellin, Colombia
[2] Friedrich Alexander Univ Erlangen Nurnberg, Pattern Recognit Lab, Nurnberg, Germany
[3] Pratech Grp, Medellin, Colombia
关键词
Dialect classification; Speech; Text; Customer Service; Acoustics; Language processing; LANGUAGE;
D O I
10.1109/ASRU51503.2021.9687890
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Dialect recognition is useful in many industrial sectors, particularly with the aim of allowing a better interaction between customers and providers. The core idea is to improve or customize marketing and customer service strategies, depending on the geographic location, birthplace and culture. This study proposes different models to automatically discriminate between two Colombian dialects: "Antioquefio" and "Bogotano", to the best of our knowledge this is the first work of Colombian dialect recognition based on real conversations from customer service centers. The proposed strategy consists of independent analyses, using information from speech recordings and their corresponding transliterations. On the one hand, classical approaches are used to model speech including prosody features, Mel frequency cepstral coefficients and the mean Hilbert envelope coefficients. For text models, Word2Vec and bidirectional encoding representations from transformer embeddings are considered. On the other hand, a deep learning approach is applied by considering convolutional neural networks, which are trained using spectrograms and embedding matrices for speech and text, respectively. The implemented deep learning models seem to be more promising than the classical ones for the addressed problem. Further experiments will be considered to validate this claim in a wider spectrum of methods.
引用
收藏
页码:556 / 563
页数:8
相关论文
共 50 条
  • [21] Multimodal emotion recognition based on speech and ECG signals
    Huang C.
    Jin Y.
    Wang Q.
    Zhao L.
    Zou C.
    Dongnan Daxue Xuebao (Ziran Kexue Ban)/Journal of Southeast University (Natural Science Edition), 2010, 40 (05): : 895 - 900
  • [22] Speech Recognition via fNIRS Based Brain Signals
    Liu, Yichuan
    Ayaz, Hasan
    FRONTIERS IN NEUROSCIENCE, 2018, 12
  • [23] Approximated mutual information training for speech recognition using myoelectric signals
    Guo, Hua J.
    Chan, A. D. C.
    2006 28TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-15, 2006, : 96 - 99
  • [24] Language dialect based speech emotion recognition through deep learning techniques
    Rajendran, Sukumar
    Mathivanan, Sandeep Kumar
    Jayagopal, Prabhu
    Venkatasen, Maheshwari
    Pandi, Thanapal
    Sorakaya Somanathan, Manivannan
    Thangaval, Muthamilselvan
    Mani, Prasanna
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 24 (03) : 625 - 635
  • [25] Language dialect based speech emotion recognition through deep learning techniques
    Sukumar Rajendran
    Sandeep Kumar Mathivanan
    Prabhu Jayagopal
    Maheshwari Venkatasen
    Thanapal Pandi
    Manivannan Sorakaya Somanathan
    Muthamilselvan Thangaval
    Prasanna Mani
    International Journal of Speech Technology, 2021, 24 : 625 - 635
  • [26] Farsi Font Recognition Based On the Fonts of Text Samples Extracted by SOM
    Ziaratban, Majid
    Bagheri, Fatemeh
    JOURNAL OF MATHEMATICS AND COMPUTER SCIENCE-JMCS, 2015, 15 (01): : 40 - 56
  • [27] Text information hiding based on Part of Speech grammar
    Dai Zuxu
    Hong Fan
    Yang Muxiang
    Cui Guohua
    CIS WORKSHOPS 2007: INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY WORKSHOPS, 2007, : 632 - +
  • [28] Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images
    Koji Iwano
    Tomoaki Yoshinaga
    Satoshi Tamura
    Sadaoki Furui
    EURASIP Journal on Audio, Speech, and Music Processing, 2007
  • [29] Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images
    Iwano, Koji
    Yoshinaga, Tomoaki
    Tamura, Satoshi
    Furui, Sadaoki
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2007, 2007 (1)
  • [30] Emotion recognition and evaluation from Mandarin speech signals
    Pao, Tsanglong
    Chen, Yute
    Yeh, Junheng
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2008, 4 (07): : 1695 - 1709