Denoised Bottleneck Features From Deep Autoencoders for Telephone Conversation Analysis

被引:13
|
作者
Janod, Killian [1 ]
Morchid, Mohamed [2 ]
Dufour, Richard [2 ]
Linares, Georges [2 ]
De Mori, Renato [3 ]
机构
[1] Univ Avignon, Ctr Enseignement & Rech Informat, F-84911 Avignon, France
[2] Univ Avignon, Lab Informat Avignon, F-84911 Avignon, France
[3] McGill Univ, Comp Sci, Montreal, PQ H3A 2A7, Canada
关键词
Automatic speech recognition (ASR); denoisng autoencoders (DAEs); multilayer neural networks; speech analytics; stacked autoencoders (SAEs); ARCHITECTURES;
D O I
10.1109/TASLP.2017.2718843
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Automatic transcription of spoken documents is affected by automatic transcription errors that are especially frequent when speech is acquired in severe noisy conditions. Automatic speech recognition errors induce errors in the linguistic features used for a variety of natural language processing tasks. Recently, denoisng autoencoders (DAE) and stacked autoencoders (SAE) have been proposed with interesting results for acoustic feature denoising tasks. This paper deals with the recovery of corrupted linguistic features in spoken documents. Solutions based on DAEs and SAEs are considered and evaluated in a spoken conversation analysis task. In order to improve conversation theme classification accuracy, the possibility of combining abstractions obtained from manual and automatic transcription features is considered. As a result, two original representations of highly imperfect spoken documents are introduced. They are based on bottleneck features of a supervised autoencoder that takes advantage of both noisy and clean transcriptions to improve the robustness of error prone representations. Experimental results on a spoken conversation theme identification task show substantial accuracy improvements obtained with the proposed recovery of corrupted features.
引用
收藏
页码:1505 / 1516
页数:12
相关论文
共 50 条
  • [21] Performance Evaluation of Deep Bottleneck Features for Spoken Language Identification
    Jiang, Bing
    Song, Yan
    Wei, Si
    Wang, Meng-Ge
    McLoughlin, Ian
    Dai, Li-Rong
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 143 - +
  • [22] ENTRAINMENT ANALYSIS FOR ASSESSMENT OF AUTISTIC SPEECH PROSODY USING BOTTLENECK FEATURES OF DEEP NEURAL NETWORK
    Ochi, Keiko
    Ono, Nobutaka
    Owada, Keiho
    Kuroda, Miho
    Sagayama, Shigeki
    Yamasue, Hidenori
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8492 - 8496
  • [23] Improved Bottleneck Features Using Pretrained Deep Neural Networks
    Yu, Dong
    Seltzer, Michael L.
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 244 - 247
  • [24] Deep Neural Network Bottleneck Features for Acoustic Event Recognition
    Mun, Seongkyu
    Shon, Suwon
    Kim, Wooil
    Ko, Hanseok
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2954 - 2957
  • [25] Learning landscape features from streamflow with autoencoders
    Bassi, Alberto
    Hoge, Marvin
    Mira, Antonietta
    Fenicia, Fabrizio
    Albert, Carlo
    HYDROLOGY AND EARTH SYSTEM SCIENCES, 2024, 28 (22) : 4971 - 4988
  • [26] INTERVIEW + EXCERPTS FROM A TELEPHONE CONVERSATION WITH ISOZAKI,ARATA
    GARCIAMARQUES, F
    ARATA, I
    PROGRESSIVE ARCHITECTURE, 1991, 72 (04): : 70 - &
  • [27] Background subtraction by probabilistic modeling of patch features learned by deep autoencoders
    Garcia-Gonzalez, Jorge
    Ortiz-de-Lazcano-Lobato, Juan M.
    Luque-Baena, Rafael M.
    Lopez-Rubio, Ezequiel
    INTEGRATED COMPUTER-AIDED ENGINEERING, 2020, 27 (03) : 253 - 265
  • [28] Deep Autoencoders: From Understanding to Generalization Guarantees
    Cosentino, Romain
    Balestriero, Randall
    Baraniuk, Richard
    Aazhang, Behnaam
    MATHEMATICAL AND SCIENTIFIC MACHINE LEARNING, VOL 145, 2021, 145 : 197 - 222
  • [29] Conversation Analysis and Telephone Helplines for Health and Illness: A Narrative Review
    Bloch, Steven
    Leydon, Geraldine
    RESEARCH ON LANGUAGE AND SOCIAL INTERACTION, 2019, 52 (03) : 193 - 211
  • [30] DEEP NEURAL NETWORK DERIVED BOTTLENECK FEATURES FOR ACCURATE AUDIO CLASSIFICATION
    Zhang, Bihong
    Xie, Lei
    Yuan, Yougen
    Ming, Huaiping
    Huang, Dongyan
    Song, Mingli
    2016 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO WORKSHOPS (ICMEW), 2016,