PAN: PHONEME-AWARE NETWORK FOR MONAURAL SPEECH ENHANCEMENT

被引：0

作者：

Du, Zhihao ^{[1
]}

Lei, Ming ^{[2
]}

Han, Jiqing ^{[1
]}

Zhang, Shiliang ^{[2
]}

机构：

[1] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin, Peoples R China

[2] Alibaba Grp, Machine Intelligence Technol, Hangzhou, Peoples R China

来源：

2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2020年

基金：

中国国家自然科学基金;

关键词：

Monaural speech enhancement; phonetic posteriorgram; phoneme-aware network;

D O I：

10.1109/icassp40776.2020.9054334

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Current methods for monaural speech enhancement only utilize acoustic information but seldom consider the phonetic information of an utterance. In the voice conversion community, significant progress has been achieved by using the phonetic information via the phonetic posteriorgrams (PPGs). Inspired by the progress, we propose a phoneme-aware network (PAN) to utilize the noisy PPGs for speech enhancement. Since the PPG prediction and speech enhancement benefit from each other, a PPG predictor is involved into the PAN and an iterative training algorithm is proposed for PAN. Experimental results show that the enhancement performance is improved by using the phonetic information in terms of speech intelligibility, perceptual quality and character error rate. To the best of our knowledge, this is the first time to introduce the PPG into speech enhancement.

引用

页码：6634 / 6638

页数：5

共 50 条

[1] Speaker and Phoneme-Aware Speech Bandwidth Extension with Residual Dual-Path Network
Hou, Nana
Xu, Chenglin
Van Tung Pham
Zhou, Joey Tianyi
Chng, Eng Siong
Li, Haizhou
INTERSPEECH 2020, 2020, : 4064 - 4068
[2] Phoneme-dependent NMF for speech enhancement in monaural mixtures
Raj, Bhiksha
Singh, Rita
Virtanen, Tuomas
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1224 - +
[3] Convolutional fusion network for monaural speech enhancement
Xian, Yang
Sun, Yang
Wang, Wenwu
Naqvi, Syed Mohsen
NEURAL NETWORKS, 2021, 143 : 97 - 107
[4] Additive Phoneme-aware Margin Softmax Loss for Language Recognition
Li, Zheng
Liu, Yan
Li, Lin
Hong, Qingyang
INTERSPEECH 2021, 2021, : 3276 - 3280
[5] SpecMNet: Spectrum mend network for monaural speech enhancement
Fan, Cunhang
Zhang, Hongmei
Yi, Jiangyan
Lv, Zhao
Tao, Jianhua
Li, Taihao
Pei, Guanxiong
Wu, Xiaopei
Li, Sheng
APPLIED ACOUSTICS, 2022, 194
[6] A Recursive Network with Dynamic Attention for Monaural Speech Enhancement
Li, Andong
Zheng, Chengshi
Fan, Cunhang
Peng, Renhua
Li, Xiaodong
INTERSPEECH 2020, 2020, : 2422 - 2426
[7] Scale-aware dual-branch complex convolutional recurrent network for monaural speech enhancement
Li, Yihao
Sun, Meng
Zhang, Xiongwei
Van Hamme, Hugo
COMPUTER SPEECH AND LANGUAGE, 2024, 86
[8] Phoneme-aware Encoding for Prefix-tree-based Contextual ASR
Futami, Hayato
Tsunoo, Emiru
Kashiwagi, Yosuke
Ogawa, Hiroaki
Arora, Siddhant
Watanabe, Shinji
arXiv, 2023,
[9] Double Adversarial Network based Monaural Speech Enhancement for Robust Speech Recognition
Du, Zhihao
Han, Jiqing
Zhang, Xueliang
INTERSPEECH 2020, 2020, : 309 - 313
[10] MRGAN: LightWeight Monaural Speech Enhancement Using GAN Network
Meng, Chunyu
Wei, Guangcun
Long, Yanhong
Kong, Chuike
Ma, Penghao
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT IV, 2025, 15034 : 369 - 377

← 1 2 3 4 5 →