Infant cry classification by MFCC feature extraction with MLP and CNN structures

被引:15
|
作者
Abbaskhah, Ahmad [1 ,4 ]
Sedighi, Hamed [2 ,3 ,5 ]
Marvi, Hossein [4 ]
机构
[1] Sharif Univ Technol, Dept Elect Engn, Sharif, Iran
[2] Beijing Inst Technol, Sch Aerosp & Engn, Beijing, Peoples R China
[3] Shahrood Univ Technol, Fac Mech Engn, Shahrood, Iran
[4] Shahrood Univ Technol, Fac Elect Engn, Shahrood, Iran
[5] Shahrood Univ Technol, Fac Mech Engn, Shahrood 3619995161, Iran
关键词
Infant cry; Mel-frequency Cepstral Coefficient; Multilayer perceptron; Support vector machine; Convolutional neural network; SMOTE; Classification; IDENTIFICATION;
D O I
10.1016/j.bspc.2023.105261
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
In this study, Dunstan's infant cry data set is pre-processed with the feature vector approach, including MFCC (19 features) and energy (one feature). By using extracted features and Support Vector Machine (SVM), Multilayer Perceptron (MLP), and Convolutional Neural Network (CNN) classifiers, five classes of infant cry ("Neh" = hungry; "Eh" = need to burp; "Owh" = tired; "Eairh" = stomach cramp; "Heh" = physical discomfort) are distinguished. The proposed MLP and CNN structures are analyzed according to the loss and the accuracy based on the epoch; moreover, to evaluate the performance of classifiers AUC-ROC, Confusion matrix, accuracy, f1_score, recall, and precision have been used. All three classifiers are analyzed, and their results show that the CNN-designed model has the best performance. Results show that the performance will improve by increasing the complexity of the model. With this approach, classifiers are run 10 times, and the average accuracy for SVM for SMOTE and non-SMOTE data are obtained with tolerance 0.823 +/- 0.02, 0.861 +/- 0.02, respectively. These accuracies for MLP are 0.876 +/- 0.01, 0.892 +/- 0.01, and finally, for CNN, are 0.921 +/- 0.005, 0.911 +/- 0.005. At the best condition, an accuracy of 92.1 % is obtained for five classes of infant cries by the proposed CNN structure.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] Infant Cry Classification Integrated ANC System for Infant Incubators
    Liu, Lichuan
    Kuo, Kevin
    Kuo, Sen M.
    2013 10TH IEEE INTERNATIONAL CONFERENCE ON NETWORKING, SENSING AND CONTROL (ICNSC), 2013, : 383 - 387
  • [42] Robustness of Whisper Features for Infant Cry Classification
    Charola, Monil
    Rathod, Siddharth
    Patil, Hemant A.
    SPEECH AND COMPUTER, SPECOM 2023, PT II, 2023, 14339 : 421 - 433
  • [43] Infant Cry Classification: Time Frequency Analysis
    Saraswathy, J.
    Hariharan, M.
    Khairunizam, Wan
    Yaacob, Sazali
    Thiyagar, N.
    2013 IEEE INTERNATIONAL CONFERENCE ON CONTROL SYSTEM, COMPUTING AND ENGINEERING (ICCSCE 2013), 2013, : 499 - +
  • [44] Whisper Encoder features for Infant Cry Classification
    Charola, Monil
    Kachhi, Aastha
    Patil, Hemant A.
    INTERSPEECH 2023, 2023, : 1773 - 1777
  • [45] Improved Window Function in the Application of MFCC Feature Parameter Extraction
    Bai, Leqiang
    Zhang, Xuewei
    MECHATRONICS ENGINEERING, COMPUTING AND INFORMATION TECHNOLOGY, 2014, 556-562 : 3703 - 3706
  • [46] Hardware Implementation of MFCC Feature Extraction for Speech Recognition on FPGA
    Van-Lan Dao
    Van-Danh Nguyen
    Hai-Duong Nguyen
    Van-Phuc Hoang
    ADVANCES IN INFORMATION AND COMMUNICATION TECHNOLOGY, 2017, 538 : 248 - 254
  • [47] Inherent emotional feature extraction of neonatal cry
    Yu, Jiezhou
    Meng, Jun
    Zhao, Ximeng
    PROCEEDINGS OF THE 32ND 2020 CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2020), 2020, : 3416 - 3423
  • [48] A Modified MFCC Feature Extraction Technique For Robust Speaker Recognition
    Sharma, Diksha
    Ali, Israj
    2015 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2015, : 1052 - 1057
  • [49] Feature extraction for poultry vocalization recognition based on improved MFCC
    Key Laboratory of Agricultural Bioenvironmental Engineering, College of Water Conservancy and Civil Engineering, China Agricultural University, Beijing 100083, China
    Nongye Gongcheng Xuebao/Transactions of the Chinese Society of Agricultural Engineering, 2008, 24 (11): : 202 - 205
  • [50] Identification of Speaker from Disguised Voice Using MFCC Feature Extraction, Chi-Square and Classification Technique
    Singh, Mahesh K.
    WIRELESS PERSONAL COMMUNICATIONS, 2024, 138 (02) : 973 - 987