JOINT ENCODING OF THE WAVEFORM AND SPEECH RECOGNITION FEATURES USING A TRANSFORM CODEC

被引:0
|
作者
Fan, Xing [1 ]
Seltzer, Michael L. [1 ]
Droppo, Jasha [1 ]
Malvar, Henrique S. [1 ]
Acero, Alex [1 ]
机构
[1] Microsoft Res, Redmond, WA 98052 USA
关键词
transform coding; speech coding; distributed speech recognition; Siren codec;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We propose a new transform speech codec that jointly encodes a wideband waveform and its corresponding wideband and narrowband speech recognition features. For distributed speech recognition, wideband features are compressed and transmitted as side information. The waveform is then encoded in a manner that exploits the information already captured by the speech features. Narrowband speech acoustic features can be synthesized at the server by applying a transformation to the decoded wideband features. An evaluation conducted on an in-car speech recognition task show that at 16 kbps our new system typically shows essentially no impact in word error rate compared to uncompressed audio, whereas the standard transform codec produces up to a 20% increase in word error rate. In addition, good quality speech is obtained for playback and transcription, with PESQ scores ranging from 3.2 to 3.4.
引用
收藏
页码:5148 / 5151
页数:4
相关论文
共 50 条
  • [41] Investigating Low-Distortion Speech Enhancement with Discrete Cosine Transform Features for Robust Speech Recognition
    Tsao, Yu-Sheng
    Hung, Jeih-Weih
    Ho, Kuan-Hsun
    Chen, Berlin
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 131 - 136
  • [42] Estimating Reliability of Speech Recognition Results Using Recognition Results Change Degree When Minute Changes to Speech Waveform
    Iida, Atsuteru
    Nishida, Hikaru
    Wakita, Yumi
    ARTIFICIAL INTELLIGENCE IN HCI, PT II, AI-HCI 2024, 2024, 14735 : 190 - 201
  • [43] Emotion Recognition in Speech Using MFCC and Wavelet Features
    Kishore, K. V. Krishna
    Satish, P. Krishna
    PROCEEDINGS OF THE 2013 3RD IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE (IACC), 2013, : 842 - 847
  • [44] Speech emotion recognition using nonlinear dynamics features
    Shahzadi, Ali
    Ahmadyfard, Alireza
    Harimi, Ali
    Yaghmaie, Khashayar
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2015, 23 : 2056 - 2073
  • [45] Speech recognition using filter-bank features
    Ravindran, S
    Demiroglu, C
    Anderson, DV
    CONFERENCE RECORD OF THE THIRTY-SEVENTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, VOLS 1 AND 2, 2003, : 1900 - 1903
  • [46] AUDIO SEGMENTATION FOR SPEECH RECOGNITION USING SEGMENT FEATURES
    Rybach, David
    Gollan, Christian
    Schlueter, Ralf
    Ney, Hermann
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4197 - 4200
  • [47] Speech Emotion Recognition Using Magnitude and Phase Features
    Shankar D.R.
    Manjula R.B.
    Biradar R.C.
    SN Computer Science, 5 (5)
  • [48] RECOGNITION OF EMOTION IN SPEECH USING VARIOGRAM BASED FEATURES
    Esmaileyan, Zeynab
    Marvi, Hosein
    MALAYSIAN JOURNAL OF COMPUTER SCIENCE, 2014, 27 (03) : 156 - 170
  • [49] Speech Emotion Recognition Using Minimum Extracted Features
    Abdulsalam, Wisal Hashim
    Alhamdani, Rafah Shihab
    Abdullah, Mohammed Najm
    2018 1ST ANNUAL INTERNATIONAL CONFERENCE ON INFORMATION AND SCIENCES (AICIS 2018), 2018, : 58 - 61
  • [50] Speech recognition using reconstructed phase space features
    Lindgren, AC
    Johnson, MT
    Povinelli, RJ
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 60 - 63