Denoised Bottleneck Features From Deep Autoencoders for Telephone Conversation Analysis

被引:13
|
作者
Janod, Killian [1 ]
Morchid, Mohamed [2 ]
Dufour, Richard [2 ]
Linares, Georges [2 ]
De Mori, Renato [3 ]
机构
[1] Univ Avignon, Ctr Enseignement & Rech Informat, F-84911 Avignon, France
[2] Univ Avignon, Lab Informat Avignon, F-84911 Avignon, France
[3] McGill Univ, Comp Sci, Montreal, PQ H3A 2A7, Canada
关键词
Automatic speech recognition (ASR); denoisng autoencoders (DAEs); multilayer neural networks; speech analytics; stacked autoencoders (SAEs); ARCHITECTURES;
D O I
10.1109/TASLP.2017.2718843
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Automatic transcription of spoken documents is affected by automatic transcription errors that are especially frequent when speech is acquired in severe noisy conditions. Automatic speech recognition errors induce errors in the linguistic features used for a variety of natural language processing tasks. Recently, denoisng autoencoders (DAE) and stacked autoencoders (SAE) have been proposed with interesting results for acoustic feature denoising tasks. This paper deals with the recovery of corrupted linguistic features in spoken documents. Solutions based on DAEs and SAEs are considered and evaluated in a spoken conversation analysis task. In order to improve conversation theme classification accuracy, the possibility of combining abstractions obtained from manual and automatic transcription features is considered. As a result, two original representations of highly imperfect spoken documents are introduced. They are based on bottleneck features of a supervised autoencoder that takes advantage of both noisy and clean transcriptions to improve the robustness of error prone representations. Experimental results on a spoken conversation theme identification task show substantial accuracy improvements obtained with the proposed recovery of corrupted features.
引用
收藏
页码:1505 / 1516
页数:12
相关论文
共 50 条
  • [1] Telephone conversation impairs sustained visual attention via a central bottleneck
    Melina A. Kunar
    Randall Carter
    Michael Cohen
    Todd S. Horowitz
    Psychonomic Bulletin & Review, 2008, 15 : 1135 - 1140
  • [2] Telephone conversation impairs sustained visual attention via a central bottleneck
    Kunar, Meuna A.
    Carter, Randall
    Cohen, Michael
    Horowitz, Todd S.
    PSYCHONOMIC BULLETIN & REVIEW, 2008, 15 (06) : 1135 - 1140
  • [3] Sentence Bottleneck Autoencoders from Transformer Language
    Montero, Ivan
    Pappas, Nikolaos
    Smith, Noah A.
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 1822 - 1831
  • [4] Features Learning and Transformation Based on Deep Autoencoders
    Janvier, Eric
    Couronne, Thierry
    Grozavu, Nistor
    NEURAL INFORMATION PROCESSING, ICONIP 2016, PT III, 2016, 9949 : 111 - 118
  • [5] Prosodic Representations of Prominence Classification Neural Networks and Autoencoders Using Bottleneck Features
    Kakouros, Sofoklis
    Suni, Antti
    Simko, Juraj
    Vainio, Martti
    INTERSPEECH 2019, 2019, : 1946 - 1950
  • [6] Deep image hashing based on twin-bottleneck hashing with variational autoencoders
    Verwilst, Maxim
    Zizakic, Nina
    Gu, Lingchen
    Pizurica, Aleksandra
    IEEE MMSP 2021: 2021 IEEE 23RD INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2021,
  • [7] Association Analysis of Deep Genomic Features Extracted by Denoising Autoencoders in Breast Cancer
    Liu, Qian
    Hu, Pingzhao
    CANCERS, 2019, 11 (04)
  • [8] DEEP HIERARCHICAL BOTTLENECK MRASTA FEATURES FOR LVCSR
    Tueske, Zoltan
    Schlueter, Ralf
    Ney, Hermann
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6970 - 6974
  • [9] Deep Bottleneck Features for Spoken Language Identification
    Jiang, Bing
    Song, Yan
    Wei, Si
    Liu, Jun-Hua
    McLoughlin, Ian Vince
    Dai, Li-Rong
    PLOS ONE, 2014, 9 (07):
  • [10] Audio-visual voice conversion using deep canonical correlation analysis for deep bottleneck features
    Tamura, Satoshi
    Horio, Kento
    Endo, Hajime
    Hayamizu, Satoru
    Toda, Tomoki
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2469 - 2473