Implementation of Real-Time Speech Separation Model Using Time-Domain Audio Separation Network (TasNet) and Dual-Path Recurrent Neural Network (DPRNN)

被引：3

作者：

Wijayakusuma, Alfian ^{[1
]}

Gozali, Davin Reinaldo ^{[1
]}

Widjaja, Anthony ^{[1
]}

Ham, Hanry ^{[1
]}

机构：

[1] Bina Nusantara Univ, Sch Comp Sci, Comp Sci Dept, Jakarta 11480, Indonesia

来源：

5TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND COMPUTATIONAL INTELLIGENCE 2020 | 2021年 / 179卷

关键词：

Speech Separation; Time-Domain; Time-Domain Audio Separation Network; Dual-Path Recurrent Neural Network; Real-Time;

D O I：

10.1016/j.procs.2021.01.065

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The purpose of this research is to develop a model that is able to perform real-time speaker independent multi-talker speech separation task in time-domain using Time-Domain Audio Separation Network (TasNet) and Dual-Path Recurrent Neural Network (DPRNN). This research will conduct experiments on some RNN architectures, number of batch size, and optimizers as hyperparameters in order to implement TasNet and DPRNN. This research also try to analyze the impact of these hyperparameters setup on model performance. The expected result of this research is a more accurate model and lower latency to complete speaker independent multi-talker speech separation task in real-time than previous research model. (C) 2021 The Authors. Published by Elsevier B.V.

引用

页码：762 / 772

页数：11

共 50 条

[41] Real-time taxi demand prediction using recurrent neural network
Ku, Donggyun
Na, Sungyong
Kim, Jooyoung
Lee, Seungjae
PROCEEDINGS OF THE INSTITUTION OF CIVIL ENGINEERS-MUNICIPAL ENGINEER, 2021, 174 (02) : 75 - 87
[42] REAL-TIME ONE-PASS DECODING WITH RECURRENT NEURAL NETWORK LANGUAGE MODEL FOR SPEECH RECOGNITION
Hori, Takaaki
Kubo, Yotaro
Nakamura, Atsushi
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[43] ONLINE DEEP ATTRACTOR NETWORK FOR REAL-TIME SINGLE-CHANNEL SPEECH SEPARATION
Han, Cong
Luo, Yi
Mesgarani, Nima
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 361 - 365
[44] Speech Separation Using an Asynchronous Fully Recurrent Convolutional Neural Network
Hu, Xiaolin
Li, Kai
Zhang, Weiyi
Luo, Yi
Lemercier, Jean-Marie
Gerkmann, Timo
Advances in Neural Information Processing Systems, 2021, 27 : 22509 - 22522
[45] Speech separation using an asynchronous Fully Recurrent Convolutional Neural Network
Department of Computer Science and Technology, Tsinghua Laboratory of Brain and Intelligence , IDG/McGovern Institute of Brain Research, Tsinghua University, Beijing, China
不详
不详
arXiv, 2021,
[46] Speech Separation Using an Asynchronous Fully Recurrent Convolutional Neural Network
Hu, Xiaolin
Li, Kai
Zhang, Weiyi
Luo, Yi
Lemercier, Jean-Marie
Gerkmann, Timo
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
[47] Real-time speaker localization and speech separation by audio-visual integration
Nakadai, K
Hidai, K
Okuno, HG
Kitano, H
2002 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, VOLS I-IV, PROCEEDINGS, 2002, : 1043 - 1049
[48] A NEURAL NETWORK IMPLEMENTATION FOR REAL-TIME SCENE ANALYSIS
BOOTH, R
ALLEN, CR
ADAMS, AE
FIRST IEE INTERNATIONAL CONFERENCE ON ARTIFICIAL NEURAL NETWORKS, 1989, : 71 - 75
[49] YOLOv3-DPFIN: A Dual-Path Feature Fusion Neural Network for Robust Real-Time Sonar Target Detection
Kong, Wanzeng
Hong, Jichen
Jia, Mingyang
Yao, Jinliang
Gong, Weihua
Hu, Hua
Zhang, Haigang
IEEE SENSORS JOURNAL, 2020, 20 (07) : 3745 - 3756
[50] Real-Time Semantic Segmentation for Road Scene Based on Data Enhancement and Dual-Path Fusion Network
Zhang Z.-W.
Liu T.-G.
Nie P.-J.
Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2022, 50 (07): : 1609 - 1620

← 1 2 3 4 5 →