COLOMBIAN DIALECT RECOGNITION BASED ON INFORMATION EXTRACTED FROM SPEECH AND TEXT SIGNALS

被引:4
|
作者
Escobar-Grisales, D. [1 ]
Rios-Urrego, C. D. [1 ]
Lopez-Santander, D. A. [1 ]
Gallo-Aristizabal, J. D. [1 ]
Vasquez-Correa, J. C. [1 ,2 ,3 ]
Noeth, E. [2 ]
Orozco-Arroyave, J. R. [1 ,2 ]
机构
[1] Univ Antioquia UdeA, Fac Engn, GITA Lab, Medellin, Colombia
[2] Friedrich Alexander Univ Erlangen Nurnberg, Pattern Recognit Lab, Nurnberg, Germany
[3] Pratech Grp, Medellin, Colombia
关键词
Dialect classification; Speech; Text; Customer Service; Acoustics; Language processing; LANGUAGE;
D O I
10.1109/ASRU51503.2021.9687890
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Dialect recognition is useful in many industrial sectors, particularly with the aim of allowing a better interaction between customers and providers. The core idea is to improve or customize marketing and customer service strategies, depending on the geographic location, birthplace and culture. This study proposes different models to automatically discriminate between two Colombian dialects: "Antioquefio" and "Bogotano", to the best of our knowledge this is the first work of Colombian dialect recognition based on real conversations from customer service centers. The proposed strategy consists of independent analyses, using information from speech recordings and their corresponding transliterations. On the one hand, classical approaches are used to model speech including prosody features, Mel frequency cepstral coefficients and the mean Hilbert envelope coefficients. For text models, Word2Vec and bidirectional encoding representations from transformer embeddings are considered. On the other hand, a deep learning approach is applied by considering convolutional neural networks, which are trained using spectrograms and embedding matrices for speech and text, respectively. The implemented deep learning models seem to be more promising than the classical ones for the addressed problem. Further experiments will be considered to validate this claim in a wider spectrum of methods.
引用
收藏
页码:556 / 563
页数:8
相关论文
共 50 条
  • [31] Improving Automatic Emotion Recognition from Speech Signals
    Bozkurt, Elif
    Erzin, Engin
    Erdem, Cigdem Eroglu
    Erdem, A. Tanju
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 312 - +
  • [32] Study the Influence of Gender and Age in Recognition of Emotions from Algerian Dialect Speech
    Houari, Horkous
    Guerti, Mhania
    TRAITEMENT DU SIGNAL, 2020, 37 (03) : 413 - 423
  • [33] Radio2Text: Streaming Speech Recognition Using mmWave Radio Signals
    Zhao, Running
    Yu, Jiangtao
    Zhao, Hang
    Ngai, Edith C. H.
    PROCEEDINGS OF THE ACM ON INTERACTIVE MOBILE WEARABLE AND UBIQUITOUS TECHNOLOGIES-IMWUT, 2023, 7 (03):
  • [34] Dialect recognition from Telugu speech utterances using spectral and prosodic features
    Shivaprasad, S.
    Sadanandam, M.
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 27 (2) : 515 - 515
  • [35] Processing of Chinese language and text information system under the background of speech recognition
    Cao, Huiqin
    He, Peng
    Wang, Chengjin
    SOFT COMPUTING, 2023,
  • [36] A Train Query System Based on Speech Recognition and Text Correction
    Zhang Hui
    Wang Yongqi
    PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON ECONOMICS, SOCIAL SCIENCE, ARTS, EDUCATION AND MANAGEMENT ENGINEERING, 2015, 38 : 541 - 544
  • [37] Gender-based speaker recognition from speech signals using GMM model
    Gupta, Manish
    Bhartit, Shambhu Shankar
    Agarwal, Suneeta
    MODERN PHYSICS LETTERS B, 2019, 33 (35):
  • [38] The impact of soft information extracted from descriptive text on crowdfunding performance
    Jiang, Cuixia
    Han, Ranran
    Xu, Qifa
    Liu, Yezheng
    ELECTRONIC COMMERCE RESEARCH AND APPLICATIONS, 2020, 43
  • [39] A class-based language model for large-vocabulary speech recognition extracted from part-of-speech statistics
    Samuelsson, C
    Reichl, W
    ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 537 - 540
  • [40] Class-based language model for large-vocabulary speech recognition extracted from part-of-speech statistics
    Lucent Technologies, Murray Hill, United States
    ICASSP IEEE Int Conf Acoust Speech Signal Process Proc, (537-540):