DETECTION, SEPARATION AND RECOGNITION OF SPEECH FROM CONTINUOUS SIGNALS USING SPECTRAL FACTORISATION

被引:0
|
作者
Hurmalainen, Antti [1 ]
Gemmeke, Jort F. [2 ]
Virtanen, Tuomas [1 ]
机构
[1] Tampere Univ Technol, POB 553, FI-33101 Tampere, Finland
[2] Katholieke Univ Leuven, Dept ESAT PSI, B-3001 Heverlee, Belgium
来源
2012 PROCEEDINGS OF THE 20TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO) | 2012年
关键词
Spectral factorization; speech recognition; speaker recognition; voice activity detection; speech separation;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In real world speech processing, the signals are often continuous and consist of momentary segments of speech over non-stationary background noise. It has been demonstrated that spectral factorisation using multi-frame atoms can be successfully employed to separate and recognise speech in adverse conditions. While in previous work full knowledge of utterance endpointing and speaker identity was used for noise modelling and speech recognition, this study proposes spectral factorisation and sparse classification techniques to detect, identify, separate and recognise speech from a continuous noisy input. Speech models are trained beforehand, but noise models are acquired adaptively from the input by using voice activity detection without prior knowledge of noise-only locations. The results are evaluated on the CHiME corpus, containing utterances from 34 speakers over highly non-stationary multi-source noise.
引用
收藏
页码:2649 / 2653
页数:5
相关论文
共 50 条
  • [31] Combining spectral representations for large-vocabulary continuous speech recognition
    Garau, Giulia
    Renals, Steve
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (03): : 508 - 518
  • [32] MEETING RECOGNITION WITH CONTINUOUS SPEECH SEPARATION AND TRANSCRIPTION-SUPPORTED DIARIZATION
    von Neumann, Thilo
    Boeddeker, Christoph
    Cord-Landwehr, Tobias
    Delcroix, Marc
    Haeb-Umbach, Reinhold
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 775 - 779
  • [33] KEYWORD DETECTION IN CONVERSATIONAL SPEECH UTTERANCES USING HIDDEN MARKOV MODEL-BASED CONTINUOUS SPEECH RECOGNITION
    ROSE, RC
    COMPUTER SPEECH AND LANGUAGE, 1995, 9 (04): : 309 - 333
  • [34] CONTINUOUS SPEECH RECOGNITION FROM PHONETIC TRANSCRIPTION
    LEVINSON, SE
    LJOLJE, A
    SPEECH AND NATURAL LANGUAGE, 1989, : 292 - 292
  • [35] Dialect recognition from Telugu speech utterances using spectral and prosodic features
    Shivaprasad, S.
    Sadanandam, M.
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 27 (2) : 515 - 515
  • [36] SPEECH EMOTION RECOGNITION USING CYCLOSTATIONARY SPECTRAL ANALYSIS
    Jalili, Amin
    Sahami, Sadid
    Chi, Chong-Yung
    Amirfattahi, Rassoul
    2018 IEEE 28TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2018,
  • [37] SPEAKER ADAPTATION USING SPECTRAL INTERPOLATION FOR SPEECH RECOGNITION
    SHINODA, K
    ISO, KI
    WATANABE, T
    ELECTRONICS AND COMMUNICATIONS IN JAPAN PART III-FUNDAMENTAL ELECTRONIC SCIENCE, 1994, 77 (10): : 1 - 11
  • [38] Speech Emotion Recognition using Spectral Normalized CycleGAN
    Ding Wan
    Huang Dong-Yan
    Luo Danqing
    Zou Yuexian
    2019 8TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION WORKSHOPS AND DEMOS (ACIIW), 2019, : 93 - 99
  • [39] Emotion recognition and evaluation from Mandarin speech signals
    Pao, Tsanglong
    Chen, Yute
    Yeh, Junheng
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2008, 4 (07): : 1695 - 1709
  • [40] FPGA based emotions recognition from speech signals
    Rajasekhar, B.
    Kamaraju, M.
    Sumalatha, V.
    2017 THIRD INTERNATIONAL CONFERENCE ON BIOSIGNALS, IMAGES AND INSTRUMENTATION (ICBSII), 2017,