Implementation of Real-Time Speech Separation Model Using Time-Domain Audio Separation Network (TasNet) and Dual-Path Recurrent Neural Network (DPRNN)

被引:3
|
作者
Wijayakusuma, Alfian [1 ]
Gozali, Davin Reinaldo [1 ]
Widjaja, Anthony [1 ]
Ham, Hanry [1 ]
机构
[1] Bina Nusantara Univ, Sch Comp Sci, Comp Sci Dept, Jakarta 11480, Indonesia
来源
5TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND COMPUTATIONAL INTELLIGENCE 2020 | 2021年 / 179卷
关键词
Speech Separation; Time-Domain; Time-Domain Audio Separation Network; Dual-Path Recurrent Neural Network; Real-Time;
D O I
10.1016/j.procs.2021.01.065
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The purpose of this research is to develop a model that is able to perform real-time speaker independent multi-talker speech separation task in time-domain using Time-Domain Audio Separation Network (TasNet) and Dual-Path Recurrent Neural Network (DPRNN). This research will conduct experiments on some RNN architectures, number of batch size, and optimizers as hyperparameters in order to implement TasNet and DPRNN. This research also try to analyze the impact of these hyperparameters setup on model performance. The expected result of this research is a more accurate model and lower latency to complete speaker independent multi-talker speech separation task in real-time than previous research model. (C) 2021 The Authors. Published by Elsevier B.V.
引用
收藏
页码:762 / 772
页数:11
相关论文
共 50 条
  • [1] TASNET: TIME-DOMAIN AUDIO SEPARATION NETWORK FOR REAL-TIME, SINGLE-CHANNEL SPEECH SEPARATION
    Luo, Yi
    Mesgarani, Nima
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 696 - 700
  • [2] Multi-Head Attention Time Domain Audiovisual Speech Separation Based on Dual-Path Recurrent Network and Conv-TasNet
    Lan C.
    Jiang P.
    Chen H.
    Zhao S.
    Guo X.
    Han Y.
    Han C.
    Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology, 2024, 46 (03): : 1005 - 1012
  • [3] PERFORMANCE STUDY OF A CONVOLUTIONAL TIME-DOMAIN AUDIO SEPARATION NETWORK FOR REAL-TIME SPEECH DENOISING
    Sonning, Samuel
    Scheldt, Christian
    Erdogan, Hakan
    Wisdom, Scott
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 831 - 835
  • [4] Real-time Single-channel Dereverberation and Separation with Time-domain Audio Separation Network
    Luo, Yi
    Mesgarani, Nima
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 342 - 346
  • [5] BEAM-TASNET: TIME-DOMAIN AUDIO SEPARATION NETWORK MEETS FREQUENCY-DOMAIN BEAMFORMER
    Ochiai, Tsubasa
    Delcroix, Marc
    Ikeshita, Rintaro
    Kinoshita, Keisuke
    Nakatani, Tomohiro
    Araki, Shoko
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6384 - 6388
  • [6] NAS-TasNet: Neural Architecture Search for Time-Domain Speech Separation
    Lee, Joo-Hyun
    Chang, Joon-Hyuk
    Yang, Jae-Mo
    Moon, Han-Gil
    IEEE ACCESS, 2022, 10 : 56031 - 56043
  • [7] Light-weight speech separation based on dual-path attention and recurrent neural network
    Yang Y.
    Hu Q.
    Zhang P.
    Shengxue Xuebao/Acta Acustica, 2023, 48 (05): : 1060 - 1069
  • [8] Att-TasNet: Attending to Encodings in Time-Domain Audio Speech Separation of Noisy, Reverberant Speech Mixtures
    Ravenscroft, William
    Goetze, Stefan
    Hain, Thomas
    FRONTIERS IN SIGNAL PROCESSING, 2022, 2
  • [9] Dual-Path Hybrid Attention Network for Monaural Speech Separation
    Qiu, Wenbo
    Hu, Ying
    IEEE ACCESS, 2022, 10 : 78754 - 78763
  • [10] Efficient time-domain speech separation using short encoded sequence network
    Liu, Debang
    Zhang, Tianqi
    Christensen, Mads Graesboll
    Ma, Baoze
    Deng, Pan
    SPEECH COMMUNICATION, 2025, 166