Speaker age and gender recognition using 1D and 2D convolutional neural networks

被引:0
|
作者
Ergün Yücesoy
机构
[1] Ordu University,Vocational School of Technical Sciences
来源
关键词
Speaker age and gender recognition; CNN; MFCC; Hyperparameter tuning; Deep learning;
D O I
暂无
中图分类号
学科分类号
摘要
The speech signal is one of the most effective data sources used in human–computer interaction and is widely used in many applications such as speech/speaker recognition, emotion recognition, language recognition, and age and gender recognition. In this study, two convolutional neural networks, 1D and 2D, are designed to recognize the age and gender class of the speaker. These models are created by stacking four feature learning blocks (FLBs) and one classification block. Two different feature vectors are used in their inputs, which are formed with mel-frequency cepstrum coefficients. Each FLB consists of a convolution layer, a batch normalization layer, a ReLU layer, a max pooling layer, and a dropout layer, while the classification block consists of a flatten layer, two fully connected layers, and a softmax layer. In the study, besides the parameter optimization made by manual search method, model optimization is also carried out by trying different combinations of the basic components that make up the FLBs. In the experiments with the Common Voice Turkish dataset, the highest validation accuracy is obtained as 66.26% for the 1D model and 94.40% for the 2D model. These results reveal the effectiveness of the proposed 2D model in age and gender recognition.
引用
收藏
页码:3065 / 3075
页数:10
相关论文
共 50 条
  • [1] Speaker age and gender recognition using 1D and 2D convolutional neural networks
    Yucesoy, Erguen
    NEURAL COMPUTING & APPLICATIONS, 2024, 36 (06): : 3065 - 3075
  • [2] Evaluation of 1D and 2D Deep Convolutional Neural Networks for Driving Event Recognition
    Escotta, Alvaro Teixeira
    Beccaro, Wesley
    Ramirez, Miguel Arjona
    SENSORS, 2022, 22 (11)
  • [3] Novel 1D and 2D Convolutional Neural Networks for Facial and Speech Emotion Recognition
    Bodavarapu, Pavan Nageswar Reddy
    Reddy, B. Gowtham Kumar
    Srinivas, P. V. V. S.
    THIRD INTERNATIONAL CONFERENCE ON IMAGE PROCESSING AND CAPSULE NETWORKS (ICIPCN 2022), 2022, 514 : 374 - 384
  • [4] Efficient analysis of hydrological connectivity using 1D and 2D Convolutional Neural Networks
    Nguyen, Chi
    Tan, Chang Wei
    Daly, Edoardo
    Pauwels, Valentijn R. N.
    ADVANCES IN WATER RESOURCES, 2023, 182
  • [5] Human Activity Recognition Using 2D Convolutional Neural Networks
    Gholamrezaii, Marjan
    Almodarresi, Seyed Mohammad Taghi
    2019 27TH IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE 2019), 2019, : 1682 - 1686
  • [6] Automatic snoring detection using a hybrid 1D–2D convolutional neural network
    Ruixue Li
    Wenjun Li
    Keqiang Yue
    Rulin Zhang
    Yilin Li
    Scientific Reports, 13
  • [7] Heartbeat Classification Using 1D Convolutional Neural Networks
    Shaker, Abdelrahman M.
    Tantawi, Manal
    Shedeed, Howida A.
    Tolba, Mohamed F.
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ADVANCED INTELLIGENT SYSTEMS AND INFORMATICS 2019, 2020, 1058 : 502 - 511
  • [8] Hilbert Vector Convolutional Neural Network: 2D Neural Network on 1D Data
    Loka, Nasrulloh R. B. S.
    Kavitha, Muthusubash
    Kurita, Takio
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2019: THEORETICAL NEURAL COMPUTATION, PT I, 2019, 11727 : 458 - 470
  • [9] Speech emotion recognition using deep 1D & 2D CNN LSTM networks
    Zhao, Jianfeng
    Mao, Xia
    Chen, Lijiang
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2019, 47 : 312 - 323
  • [10] Text-Independent Speaker Identification with Glottal Flow and 1D Convolutional Neural Networks
    Camarena-Ibarrola, Antonio
    Ruiz-Gaona, Erick
    Figueroa, Karina
    PATTERN RECOGNITION, MCPR 2024, 2024, 14755 : 287 - 296