Implementation of Real-Time Speech Separation Model Using Time-Domain Audio Separation Network (TasNet) and Dual-Path Recurrent Neural Network (DPRNN)

被引:3
|
作者
Wijayakusuma, Alfian [1 ]
Gozali, Davin Reinaldo [1 ]
Widjaja, Anthony [1 ]
Ham, Hanry [1 ]
机构
[1] Bina Nusantara Univ, Sch Comp Sci, Comp Sci Dept, Jakarta 11480, Indonesia
来源
5TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND COMPUTATIONAL INTELLIGENCE 2020 | 2021年 / 179卷
关键词
Speech Separation; Time-Domain; Time-Domain Audio Separation Network; Dual-Path Recurrent Neural Network; Real-Time;
D O I
10.1016/j.procs.2021.01.065
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The purpose of this research is to develop a model that is able to perform real-time speaker independent multi-talker speech separation task in time-domain using Time-Domain Audio Separation Network (TasNet) and Dual-Path Recurrent Neural Network (DPRNN). This research will conduct experiments on some RNN architectures, number of batch size, and optimizers as hyperparameters in order to implement TasNet and DPRNN. This research also try to analyze the impact of these hyperparameters setup on model performance. The expected result of this research is a more accurate model and lower latency to complete speaker independent multi-talker speech separation task in real-time than previous research model. (C) 2021 The Authors. Published by Elsevier B.V.
引用
收藏
页码:762 / 772
页数:11
相关论文
共 50 条
  • [41] Real-time taxi demand prediction using recurrent neural network
    Ku, Donggyun
    Na, Sungyong
    Kim, Jooyoung
    Lee, Seungjae
    PROCEEDINGS OF THE INSTITUTION OF CIVIL ENGINEERS-MUNICIPAL ENGINEER, 2021, 174 (02) : 75 - 87
  • [42] REAL-TIME ONE-PASS DECODING WITH RECURRENT NEURAL NETWORK LANGUAGE MODEL FOR SPEECH RECOGNITION
    Hori, Takaaki
    Kubo, Yotaro
    Nakamura, Atsushi
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [43] ONLINE DEEP ATTRACTOR NETWORK FOR REAL-TIME SINGLE-CHANNEL SPEECH SEPARATION
    Han, Cong
    Luo, Yi
    Mesgarani, Nima
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 361 - 365
  • [44] Speech Separation Using an Asynchronous Fully Recurrent Convolutional Neural Network
    Hu, Xiaolin
    Li, Kai
    Zhang, Weiyi
    Luo, Yi
    Lemercier, Jean-Marie
    Gerkmann, Timo
    Advances in Neural Information Processing Systems, 2021, 27 : 22509 - 22522
  • [45] Speech separation using an asynchronous Fully Recurrent Convolutional Neural Network
    Department of Computer Science and Technology, Tsinghua Laboratory of Brain and Intelligence , IDG/McGovern Institute of Brain Research, Tsinghua University, Beijing, China
    不详
    不详
    arXiv, 2021,
  • [46] Speech Separation Using an Asynchronous Fully Recurrent Convolutional Neural Network
    Hu, Xiaolin
    Li, Kai
    Zhang, Weiyi
    Luo, Yi
    Lemercier, Jean-Marie
    Gerkmann, Timo
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
  • [47] Real-time speaker localization and speech separation by audio-visual integration
    Nakadai, K
    Hidai, K
    Okuno, HG
    Kitano, H
    2002 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, VOLS I-IV, PROCEEDINGS, 2002, : 1043 - 1049
  • [48] A NEURAL NETWORK IMPLEMENTATION FOR REAL-TIME SCENE ANALYSIS
    BOOTH, R
    ALLEN, CR
    ADAMS, AE
    FIRST IEE INTERNATIONAL CONFERENCE ON ARTIFICIAL NEURAL NETWORKS, 1989, : 71 - 75
  • [49] YOLOv3-DPFIN: A Dual-Path Feature Fusion Neural Network for Robust Real-Time Sonar Target Detection
    Kong, Wanzeng
    Hong, Jichen
    Jia, Mingyang
    Yao, Jinliang
    Gong, Weihua
    Hu, Hua
    Zhang, Haigang
    IEEE SENSORS JOURNAL, 2020, 20 (07) : 3745 - 3756
  • [50] Real-Time Semantic Segmentation for Road Scene Based on Data Enhancement and Dual-Path Fusion Network
    Zhang Z.-W.
    Liu T.-G.
    Nie P.-J.
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2022, 50 (07): : 1609 - 1620