COLOMBIAN DIALECT RECOGNITION BASED ON INFORMATION EXTRACTED FROM SPEECH AND TEXT SIGNALS

被引:4
|
作者
Escobar-Grisales, D. [1 ]
Rios-Urrego, C. D. [1 ]
Lopez-Santander, D. A. [1 ]
Gallo-Aristizabal, J. D. [1 ]
Vasquez-Correa, J. C. [1 ,2 ,3 ]
Noeth, E. [2 ]
Orozco-Arroyave, J. R. [1 ,2 ]
机构
[1] Univ Antioquia UdeA, Fac Engn, GITA Lab, Medellin, Colombia
[2] Friedrich Alexander Univ Erlangen Nurnberg, Pattern Recognit Lab, Nurnberg, Germany
[3] Pratech Grp, Medellin, Colombia
关键词
Dialect classification; Speech; Text; Customer Service; Acoustics; Language processing; LANGUAGE;
D O I
10.1109/ASRU51503.2021.9687890
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Dialect recognition is useful in many industrial sectors, particularly with the aim of allowing a better interaction between customers and providers. The core idea is to improve or customize marketing and customer service strategies, depending on the geographic location, birthplace and culture. This study proposes different models to automatically discriminate between two Colombian dialects: "Antioquefio" and "Bogotano", to the best of our knowledge this is the first work of Colombian dialect recognition based on real conversations from customer service centers. The proposed strategy consists of independent analyses, using information from speech recordings and their corresponding transliterations. On the one hand, classical approaches are used to model speech including prosody features, Mel frequency cepstral coefficients and the mean Hilbert envelope coefficients. For text models, Word2Vec and bidirectional encoding representations from transformer embeddings are considered. On the other hand, a deep learning approach is applied by considering convolutional neural networks, which are trained using spectrograms and embedding matrices for speech and text, respectively. The implemented deep learning models seem to be more promising than the classical ones for the addressed problem. Further experiments will be considered to validate this claim in a wider spectrum of methods.
引用
收藏
页码:556 / 563
页数:8
相关论文
共 50 条
  • [41] Visual Analysis of Character and Plot Information Extracted from Narrative Text
    John, Markus
    Lohmann, Steffen
    Koch, Steffen
    Woerner, Michael
    Ertl, Thomas
    COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS, VISIGRAPP 2016, 2017, 693 : 220 - 241
  • [42] Building Automatic Speech Recognition Systems for Moroccan Dialect: A Phoneme-Based Approach
    Abderrahim Ezzine
    Naouar Laaidi
    Ouissam Zealouk
    Hassan Satori
    SN Computer Science, 5 (6)
  • [43] Deep features-based dialect and mood recognition using assamese telephonic speech
    Sharma M.
    Sarma K.K.
    International Journal of Information and Communication Technology, 2020, 17 (04): : 343 - 363
  • [44] Bimodal Emotion Recognition Based on Speech Signals and Facial Expression
    Tu, Binbin
    Yu, Fengqin
    FOUNDATIONS OF INTELLIGENT SYSTEMS (ISKE 2011), 2011, 122 : 691 - 696
  • [46] Age Recognition Based on Speech Signals using Weights Supervector
    Porat, Royi
    Lange, Dan
    Zigel, Yaniv
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2818 - 2821
  • [47] Speech emotion recognition using MFCCs extracted from a mobile terminal based on ETSI front end
    Beritelli, Francesco
    Casale, Salvatore
    Russo, Alessandra
    Serrano, Salvatore
    2006 8TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, VOLS 1-4, 2006, : 1607 - +
  • [48] A review on emotion recognition from dialect speech using feature optimization and classification techniques
    Thimmaiah, Sunil
    Vinay, N. A.
    Ravikumar, M. G.
    Prasad, S. R.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (29) : 73793 - 73793
  • [49] Implementation of Multi-modal Speech Emotion Recognition Using Text Data and Audio Signals
    Adesola, Falade
    Adeyinka, Omirinlewo
    Kayode, Akindeji
    Ayodele, Adebiyi
    2023 International Conference on Science, Engineering and Business for Sustainable Development Goals, SEB-SDG 2023, 2023,
  • [50] Automatic Speech Recognition from Neural Signals: A Focused Review
    Herff, Christian
    Schultz, Tanja
    FRONTIERS IN NEUROSCIENCE, 2016, 10