Accent Recognition Using a Spectrogram Image Feature-Based Convolutional Neural Network

被引:6
|
作者
Cetin, Onursal [1 ]
机构
[1] Bandirma Onyedi Eylul Univ, Elect & Elect Engn Dept, TR-10200 Balikesir, Turkey
关键词
Regional accent recognition; Spectrogram; Convolutional neural network; Transfer learning; I-vector; SOUND EVENT CLASSIFICATION; FREQUENCY-CHARACTERISTICS; CNN;
D O I
10.1007/s13369-022-07086-9
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Accent recognition is a significant area of research, whose importance has increased in recent years. Numerous studies have been carried out using various languages to improve the performance of accent recognition systems. However, the recognition of a language's regional accents is still a challenging problem. In this study, regional accents of British English were recognized for both gender-independent and gender-dependent experiments using a convolutional neural network. Many different acoustic features were used in the studies. While there is still no generally accepted feature set, the selection of handcrafted features is a challenging task. Moreover, converting audio signals into images in the most appropriate way is critical for a convolutional neural network, a deep learning model commonly used in image applications. To take advantage of the convolutional neural networks' ability to characterize two-dimensional signals, spectrogram image features that visualize the speech signal frequency distribution were used. For this purpose, sound signals were first segmented to their state before normalization. Each segment was combined by taking the fast Fourier transform. The absolute value was taken, and then, the log function was used to compress the dynamic range of these linear rate maps, resulting in log-power rate maps. After a grayscale image was formed by normalizing the obtained time-frequency matrix in the range of [0, 1], the dynamic range was quantified to red, green, and blue color values to generate a monochrome image. Thus, the feature extraction process, which is time-consuming and challenging, was simplified using spectrogram images and a convolutional neural network. In addition, although it is desired that the training and test data have a uniform distribution, the heterogeneity of the data adversely affects the performance of machine learning algorithms. To overcome this problem and improve the model's performance, transfer learning, a state-of-the-art technology that enables data transfer from the pre-trained AlexNet model with 1.3 million pictures on the ImageNet database, was utilized. Several performance metrics, such as accuracy, specificity, sensitivity, precision, and F-score, were used to evaluate the proposed approach. The accuracy of 92.92 and 93.38% and the F-score of 92.67 and 93.19% were obtained for gender-independent and gender-dependent experiments, respectively. Additionally, i-vector-based linear discriminant analysis and support vector machine methods were used in the study. Thus, the results obtained to evaluate the performance of the proposed recognition method are presented comparatively.
引用
收藏
页码:1973 / 1990
页数:18
相关论文
共 50 条
  • [41] Image Classification And Recognition Based On The Deep Convolutional Neural Network
    Wang, Yuan-yuan
    Zhang, Long-jun
    Xiao, Yang
    Xu, Jing
    Zhang, You-jun
    PROCEEDINGS OF THE 2017 2ND JOINT INTERNATIONAL INFORMATION TECHNOLOGY, MECHANICAL AND ELECTRONIC ENGINEERING CONFERENCE (JIMEC 2017), 2017, 62 : 171 - 174
  • [42] Multi-scale temporal feature-based dense convolutional network for action recognition
    Li, Xiaoqiang
    Xie, Miao
    Zhang, Yin
    Li, Jide
    JOURNAL OF ELECTRONIC IMAGING, 2020, 29 (06)
  • [43] Rethinking the image feature biases exhibited by deep convolutional neural network models in image recognition
    Dai, Dawei
    Li, Yutang
    Wang, Yuqi
    Bao, Huanan
    Wang, Guoyin
    CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2022, 7 (04) : 721 - 731
  • [44] Volumetric Feature-Based Alzheimer's Disease Diagnosis From sMRI Data Using a Convolutional Neural Network and a Deep Neural Network
    Basher, Abol
    Kim, Byeong C.
    Lee, Kun Ho
    Jung, Ho Yub
    IEEE ACCESS, 2021, 9 : 29870 - 29882
  • [45] Spotlight SAR Image Recognition Based on Dual-Channel Feature Map Convolutional Neural Network
    Liu, Junjie
    Fu, Xiongjun
    Liu, Kaiqiang
    Wang, Miao
    Zhang, Chengyan
    Su, Qinning
    2019 IEEE 4TH INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING (ICSIP 2019), 2019, : 65 - 69
  • [46] Face Recognition Based On Gabor Local Feature and Convolutional Neural Network
    Qin, Weimeng
    Wang, Lie
    Luo, Wen
    PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING, INFORMATION SCIENCE & APPLICATION TECHNOLOGY (ICCIA 2017), 2017, 74 : 571 - 576
  • [47] Facial Feature-Based Drowsiness Detection With Multi-Scale Convolutional Neural Network
    Vijaypriya, V.
    Uma, Mohan
    IEEE ACCESS, 2023, 11 : 63417 - 63429
  • [48] Coverless image steganography using morphed face recognition based on convolutional neural network
    Yung-Hui Li
    Ching-Chun Chang
    Guo-Dong Su
    Kai-Lin Yang
    Muhammad Saqlain Aslam
    Yanjun Liu
    EURASIP Journal on Wireless Communications and Networking, 2022
  • [49] Coverless image steganography using morphed face recognition based on convolutional neural network
    Li, Yung-Hui
    Chang, Ching-Chun
    Su, Guo-Dong
    Yang, Kai-Lin
    Aslam, Muhammad Saqlain
    Liu, Yanjun
    EURASIP JOURNAL ON WIRELESS COMMUNICATIONS AND NETWORKING, 2022, 2022 (01)
  • [50] OPTICAL IMPLEMENTATION OF A FEATURE-BASED NEURAL NETWORK WITH APPLICATION TO AUTOMATIC TARGET RECOGNITION
    CHAO, TH
    STONER, WW
    APPLIED OPTICS, 1993, 32 (08): : 1359 - 1369