PAN: PHONEME-AWARE NETWORK FOR MONAURAL SPEECH ENHANCEMENT

被引:0
|
作者
Du, Zhihao [1 ]
Lei, Ming [2 ]
Han, Jiqing [1 ]
Zhang, Shiliang [2 ]
机构
[1] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin, Peoples R China
[2] Alibaba Grp, Machine Intelligence Technol, Hangzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
Monaural speech enhancement; phonetic posteriorgram; phoneme-aware network;
D O I
10.1109/icassp40776.2020.9054334
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Current methods for monaural speech enhancement only utilize acoustic information but seldom consider the phonetic information of an utterance. In the voice conversion community, significant progress has been achieved by using the phonetic information via the phonetic posteriorgrams (PPGs). Inspired by the progress, we propose a phoneme-aware network (PAN) to utilize the noisy PPGs for speech enhancement. Since the PPG prediction and speech enhancement benefit from each other, a PPG predictor is involved into the PAN and an iterative training algorithm is proposed for PAN. Experimental results show that the enhancement performance is improved by using the phonetic information in terms of speech intelligibility, perceptual quality and character error rate. To the best of our knowledge, this is the first time to introduce the PPG into speech enhancement.
引用
收藏
页码:6634 / 6638
页数:5
相关论文
共 50 条
  • [1] Speaker and Phoneme-Aware Speech Bandwidth Extension with Residual Dual-Path Network
    Hou, Nana
    Xu, Chenglin
    Van Tung Pham
    Zhou, Joey Tianyi
    Chng, Eng Siong
    Li, Haizhou
    INTERSPEECH 2020, 2020, : 4064 - 4068
  • [2] Phoneme-dependent NMF for speech enhancement in monaural mixtures
    Raj, Bhiksha
    Singh, Rita
    Virtanen, Tuomas
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1224 - +
  • [3] Convolutional fusion network for monaural speech enhancement
    Xian, Yang
    Sun, Yang
    Wang, Wenwu
    Naqvi, Syed Mohsen
    NEURAL NETWORKS, 2021, 143 : 97 - 107
  • [4] Additive Phoneme-aware Margin Softmax Loss for Language Recognition
    Li, Zheng
    Liu, Yan
    Li, Lin
    Hong, Qingyang
    INTERSPEECH 2021, 2021, : 3276 - 3280
  • [5] SpecMNet: Spectrum mend network for monaural speech enhancement
    Fan, Cunhang
    Zhang, Hongmei
    Yi, Jiangyan
    Lv, Zhao
    Tao, Jianhua
    Li, Taihao
    Pei, Guanxiong
    Wu, Xiaopei
    Li, Sheng
    APPLIED ACOUSTICS, 2022, 194
  • [6] A Recursive Network with Dynamic Attention for Monaural Speech Enhancement
    Li, Andong
    Zheng, Chengshi
    Fan, Cunhang
    Peng, Renhua
    Li, Xiaodong
    INTERSPEECH 2020, 2020, : 2422 - 2426
  • [7] Scale-aware dual-branch complex convolutional recurrent network for monaural speech enhancement
    Li, Yihao
    Sun, Meng
    Zhang, Xiongwei
    Van Hamme, Hugo
    COMPUTER SPEECH AND LANGUAGE, 2024, 86
  • [8] Phoneme-aware Encoding for Prefix-tree-based Contextual ASR
    Futami, Hayato
    Tsunoo, Emiru
    Kashiwagi, Yosuke
    Ogawa, Hiroaki
    Arora, Siddhant
    Watanabe, Shinji
    arXiv, 2023,
  • [9] Double Adversarial Network based Monaural Speech Enhancement for Robust Speech Recognition
    Du, Zhihao
    Han, Jiqing
    Zhang, Xueliang
    INTERSPEECH 2020, 2020, : 309 - 313
  • [10] MRGAN: LightWeight Monaural Speech Enhancement Using GAN Network
    Meng, Chunyu
    Wei, Guangcun
    Long, Yanhong
    Kong, Chuike
    Ma, Penghao
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT IV, 2025, 15034 : 369 - 377