Unsupervised Training of a DNN-based Formant Tracker

被引:2
|
作者
Lilley, Jason [1 ]
Bunnell, H. Timothy [1 ]
机构
[1] Nemours Biomed Res, Wilmington, DE 19803 USA
来源
关键词
speech analysis; formant estimation; formant tracking; deep learning; acoustic models of speech; SPEECH;
D O I
10.21437/Interspeech.2021-1690
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Phonetic analysis often requires reliable estimation of formants, but estimates provided by popular programs can be unreliable. Recently, Dissen et al. [1] described DNN- based formant trackers that produced more accurate frequency estimates than several others, but require manually-corrected formant data for training. Here we describe a novel unsupervised training method for corpus-based DNN formant parameter estimation and tracking with accuracy similar to [1]. Frame-wise spectral envelopes serve as the input. The output is estimates of the frequencies and bandwidths plus amplitude adjustments for a prespecified number of poles and zeros, hereafter referred to as "formant parameters." A custom loss measure based on the difference between the input envelope and one generated from the estimated formant parameters is calculated and backpropagated through the network to establish the gradients with respect to the formant parameters. The approach is similar to that of autoencoders, in that the model is trained to reproduce its input in order to discover latent features, in this case, the formant parameters. Our results demonstrate that a reliable formant tracker can be constructed for a speech corpus without the need for hand-corrected training data.
引用
收藏
页码:1189 / 1193
页数:5
相关论文
共 50 条
  • [41] A DNN-based emotional speech synthesis by speaker adaptation
    Yang, Hongwu
    Zhang, Weizhao
    Zhi, Pengpeng
    2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 633 - 637
  • [42] Threats of Adversarial Attacks in DNN-Based Modulation Recognition
    Lin, Yun
    Zhao, Haojun
    Tu, Ya
    Mao, Shiwen
    Dou, Zheng
    IEEE INFOCOM 2020 - IEEE CONFERENCE ON COMPUTER COMMUNICATIONS, 2020, : 2469 - 2478
  • [43] A DNN-based Post Filter for Geometric Source Separation
    Chen, Chenghao
    Zhou, Yi
    Liu, Hongqing
    2018 INTERNATIONAL SEMINAR ON COMPUTER SCIENCE AND ENGINEERING TECHNOLOGY (SCSET 2018), 2019, 1176
  • [44] DNN-Based Semantic Rescoring Models for Speech Recognition
    Illina, Irina
    Fohr, Dominique
    TEXT, SPEECH, AND DIALOGUE, TSD 2021, 2021, 12848 : 357 - 370
  • [45] DNN-based phase estimation for online speech enhancement
    Nguyen, Binh Thien
    Wakabayashi, Yukoh
    Geng, Yuting
    Iwai, Kenta
    Nishiura, Takanobu
    ACOUSTICAL SCIENCE AND TECHNOLOGY, 2025, 46 (02) : 186 - 190
  • [46] DNN-based Approach to Detect and Classify Pathological Voice
    Chuang, Zong-Ying
    Yu, Xiao-Tong
    Chen, Ji-Ying
    Hsu, Yi-Te
    Xu, Zhe-Zhuang
    Wang, Chi-Te
    Lin, Feng-Chuan
    Fang, Shih-Hau
    2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 5238 - 5241
  • [47] A DNN-based semantic segmentation for detecting weed and crop
    You, Jie
    Liu, Wei
    Lee, Joonwhoan
    COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2020, 178
  • [48] Analyzing Decision Polygons of DNN-based Classification Methods
    Kim, Jongyoung
    Woo, Seongyoun
    Lee, Wonjun
    Kim, Donghwan
    Lee, Chulhee
    ICINCO: PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON INFORMATICS IN CONTROL, AUTOMATION AND ROBOTICS, 2020, : 346 - 351
  • [49] DNN-based Indoor Fingerprinting Localization with WiFi FTM
    Eberechukwu, Paulson
    Park, Hyunwoo
    Laoudias, Christos
    Horsmanheimo, Seppo
    Kim, Sunwoo
    2022 23RD IEEE INTERNATIONAL CONFERENCE ON MOBILE DATA MANAGEMENT (MDM 2022), 2022, : 367 - 371
  • [50] SPEAKER AND LANGUAGE FACTORIZATION IN DNN-BASED TTS SYNTHESIS
    Fan, Yuchen
    Qian, Yao
    Soong, Frank K.
    He, Lei
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5540 - 5544