Ensemble System of Deep Neural Networks for Single-Channel Audio Separation

被引:2
|
作者
Al-Kaltakchi, Musab T. S. [1 ]
Mohammad, Ahmad Saeed [2 ]
Woo, Wai Lok [3 ]
机构
[1] Mustansiriyah Univ, Coll Engn, Dept Elect Engn, Baghdad, Iraq
[2] Mustansiriyah Univ, Coll Engn, Dept Comp Engn, Baghdad, Iraq
[3] Northumbria Univ, Dept Comp & Informat Sci, Newcastle Upon Tyne NE1 8ST, England
关键词
single-channel audio separation; deep neural networks; ideal binary mask; feature fusion; EXTREME LEARNING-MACHINE; NONNEGATIVE MATRIX FACTORIZATION; SPEECH SEPARATION; ALGORITHM;
D O I
10.3390/info14070352
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Speech separation is a well-known problem, especially when there is only one sound mixture available. Estimating the Ideal Binary Mask (IBM) is one solution to this problem. Recent research has focused on the supervised classification approach. The challenge of extracting features from the sources is critical for this method. Speech separation has been accomplished by using a variety of feature extraction models. The majority of them, however, are concentrated on a single feature. The complementary nature of various features have not been thoroughly investigated. In this paper, we propose a deep neural network (DNN) ensemble architecture to completely explore the complimentary nature of the diverse features obtained from raw acoustic features. We examined the penultimate discriminative representations instead of employing the features acquired from the output layer. The learned representations were also fused to produce a new features vector, which was then classified by using the Extreme Learning Machine (ELM). In addition, a genetic algorithm (GA) was created to optimize the parameters globally. The results of the experiments showed that our proposed system completely considered various features and produced a high-quality IBM under different conditions.
引用
收藏
页数:24
相关论文
共 50 条
  • [31] Joint constraint algorithm based on deep neural network with dual outputs for single-channel speech separation
    Linhui Sun
    Ge Zhu
    Pingan Li
    Signal, Image and Video Processing, 2020, 14 : 1387 - 1395
  • [32] Joint Deep Neural Network for Single-Channel Speech Separation on Masking-Based Training Targets
    Chen, Peng
    Nguyen, Binh Thien
    Geng, Yuting
    Iwai, Kenta
    Nishiura, Takanobu
    IEEE ACCESS, 2024, 12 : 152036 - 152044
  • [33] Single-Channel Multi-Speaker Separation using Deep Clustering
    Isik, Yusuf
    Le Roux, Jonathan
    Chen, Zhuo
    Watanabe, Shinji
    Hershey, John R.
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 545 - 549
  • [34] Single-Channel Speech Separation Based on Deep Clustering with Local Optimization
    Fu, Taotao
    Yu, Ge
    Guo, Lili
    Wang, Yan
    Liang, Ji
    2017 3RD INTERNATIONAL CONFERENCE ON FRONTIERS OF SIGNAL PROCESSING (ICFSP), 2017, : 44 - 49
  • [35] Real-time Single-channel Dereverberation and Separation with Time-domain Audio Separation Network
    Luo, Yi
    Mesgarani, Nima
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 342 - 346
  • [36] Perceptual Single-Channel Audio Source Separation by Non-negative Matrix Factorization
    Kirbiz, Serap
    Gunsel, Bilge
    2009 IEEE 17TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, VOLS 1 AND 2, 2009, : 654 - 657
  • [37] Simulation of a Single-channel Separation System Based on Brownian Ratchets
    Chen Li-Li
    Fu Ying-Qiang
    Zhao Jian-Wei
    Zhang Wen-Wei
    CHEMICAL JOURNAL OF CHINESE UNIVERSITIES-CHINESE, 2013, 34 (08): : 1851 - 1857
  • [38] Kernel Machines Beat Deep Neural Networks on Mask-based Single-channel Speech Enhancement
    Hui, Like
    Ma, Siyuan
    Belkin, Mikhail
    INTERSPEECH 2019, 2019, : 2748 - 2752
  • [39] A Maximum Likelihood Approach to Deep Neural Network Based Nonlinear Spectral Mapping for Single-Channel Speech Separation
    Wang, Yannan
    Du, Jun
    Dai, Li-Rong
    Lee, Chin-Hui
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1178 - 1182
  • [40] Monoaural Audio Source Separation Using Deep Convolutional Neural Networks
    Chandna, Pritish
    Miron, Marius
    Janer, Jordi
    Gomez, Emilia
    LATENT VARIABLE ANALYSIS AND SIGNAL SEPARATION (LVA/ICA 2017), 2017, 10169 : 258 - 266