Implementation of Real-Time Speech Separation Model Using Time-Domain Audio Separation Network (TasNet) and Dual-Path Recurrent Neural Network (DPRNN)

被引:3
|
作者
Wijayakusuma, Alfian [1 ]
Gozali, Davin Reinaldo [1 ]
Widjaja, Anthony [1 ]
Ham, Hanry [1 ]
机构
[1] Bina Nusantara Univ, Sch Comp Sci, Comp Sci Dept, Jakarta 11480, Indonesia
来源
5TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND COMPUTATIONAL INTELLIGENCE 2020 | 2021年 / 179卷
关键词
Speech Separation; Time-Domain; Time-Domain Audio Separation Network; Dual-Path Recurrent Neural Network; Real-Time;
D O I
10.1016/j.procs.2021.01.065
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The purpose of this research is to develop a model that is able to perform real-time speaker independent multi-talker speech separation task in time-domain using Time-Domain Audio Separation Network (TasNet) and Dual-Path Recurrent Neural Network (DPRNN). This research will conduct experiments on some RNN architectures, number of batch size, and optimizers as hyperparameters in order to implement TasNet and DPRNN. This research also try to analyze the impact of these hyperparameters setup on model performance. The expected result of this research is a more accurate model and lower latency to complete speaker independent multi-talker speech separation task in real-time than previous research model. (C) 2021 The Authors. Published by Elsevier B.V.
引用
收藏
页码:762 / 772
页数:11
相关论文
共 50 条
  • [21] Time-domain adaptive attention network for single-channel speech separation
    Kunpeng Wang
    Hao Zhou
    Jingxiang Cai
    Wenna Li
    Juan Yao
    EURASIP Journal on Audio, Speech, and Music Processing, 2023
  • [22] Time-domain adaptive attention network for single-channel speech separation
    Wang, Kunpeng
    Zhou, Hao
    Cai, Jingxiang
    Li, Wenna
    Yao, Juan
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2023, 2023 (01)
  • [23] Audio Source Separation from a Monaural Mixture Using Convolutional Neural Network in the Time Domain
    Zhang, Peng
    Ma, Xiaohong
    Ding, Shuxue
    ADVANCES IN NEURAL NETWORKS, PT II, 2017, 10262 : 388 - 395
  • [24] Overlapped spectral demodulation of fiber Bragg grating using convolutional time-domain audio separation network
    Shan, Linlin
    Yu, Mingxin
    Xia, Jiabin
    Xin, Jingtao
    Deng, Chaofan
    Zhu, Lianqing
    OPTICAL ENGINEERING, 2023, 62 (06)
  • [25] Real-Time Speech Enhancement Based on Convolutional Recurrent Neural Network
    Girirajan, S.
    Pandian, A.
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2023, 35 (02): : 1987 - 2001
  • [26] Real-Time Thai Speech Emotion Recognition With Speech Enhancement Using Time-Domain Contrastive Predictive Coding and Conv-Tasnet
    Yuenyong, Sumeth
    Hnoohom, Narit
    Wongpatikaseree, Konlakorn
    Singkul, Sattaya
    2022 7TH INTERNATIONAL CONFERENCE ON BUSINESS AND INDUSTRIAL RESEARCH (ICBIR2022), 2022, : 78 - 83
  • [27] Embedding Recurrent Layers with Dual-Path Strategy in a Variant of Convolutional Network for Speaker-Independent Speech Separation
    Yang, Xue
    Bao, Changchun
    INTERSPEECH 2022, 2022, : 5338 - 5342
  • [28] A new time-domain detection approach with blind separation based on neural network
    Li, Guan Nan
    Zheng, Feng
    Yang, Li
    Zhang, Ning
    PROCEEDINGS OF THE SECOND INTERNATIONAL SYMPOSIUM ON ELECTRONIC COMMERCE AND SECURITY, VOL I, 2009, : 565 - +
  • [29] TCNN: TEMPORAL CONVOLUTIONAL NEURAL NETWORK FOR REAL-TIME SPEECH ENHANCEMENT IN THE TIME DOMAIN
    Pandey, Ashutosh
    Wang, DeLiang
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6875 - 6879
  • [30] An Implementation of Real-Time Audio Monitoring in Network Camera
    Yuan, Xuehao
    Zhang, Yumeng
    Li, Hui
    MULTIMEDIA AND SIGNAL PROCESSING, 2012, 346 : 412 - 419