JOINT ENCODING OF THE WAVEFORM AND SPEECH RECOGNITION FEATURES USING A TRANSFORM CODEC

被引:0
|
作者
Fan, Xing [1 ]
Seltzer, Michael L. [1 ]
Droppo, Jasha [1 ]
Malvar, Henrique S. [1 ]
Acero, Alex [1 ]
机构
[1] Microsoft Res, Redmond, WA 98052 USA
关键词
transform coding; speech coding; distributed speech recognition; Siren codec;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We propose a new transform speech codec that jointly encodes a wideband waveform and its corresponding wideband and narrowband speech recognition features. For distributed speech recognition, wideband features are compressed and transmitted as side information. The waveform is then encoded in a manner that exploits the information already captured by the speech features. Narrowband speech acoustic features can be synthesized at the server by applying a transformation to the decoded wideband features. An evaluation conducted on an in-car speech recognition task show that at 16 kbps our new system typically shows essentially no impact in word error rate compared to uncompressed audio, whereas the standard transform codec produces up to a 20% increase in word error rate. In addition, good quality speech is obtained for playback and transcription, with PESQ scores ranging from 3.2 to 3.4.
引用
收藏
页码:5148 / 5151
页数:4
相关论文
共 50 条
  • [21] EEG Waveform Classification Using Transform Domain Features and SVM
    Patil, Hemprasad Y.
    Patil, Priyanka B.
    Baji, Seema R.
    Darade, Rohini S.
    COMPUTING, COMMUNICATION AND SIGNAL PROCESSING, ICCASP 2018, 2019, 810 : 791 - 798
  • [22] A 1.9 kbps Zinc function excited, waveform interpolated speech codec
    Brooks, FCA
    Hanzo, L
    GLOBECOM 98: IEEE GLOBECOM 1998 - CONFERENCE RECORD, VOLS 1-6: THE BRIDGE TO GLOBAL INTEGRATION, 1998, : 804 - 808
  • [23] Adaptive-Order Fractional Fourier Transform Features for Speech Recognition
    Yin Hui
    Xie Xiang
    Kuang Jingming
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 654 - 657
  • [24] Speech emotion recognition using Ramanujan Fourier Transform
    Flower, T. Mary Little
    Jaya, T.
    APPLIED ACOUSTICS, 2022, 201
  • [25] Robust speech recognition using harmonic features
    Goh, Yeh Huann
    Raveendran, Paramesran
    Jamuar, Sudhanshu Shekhar
    IET SIGNAL PROCESSING, 2014, 8 (02) : 167 - 175
  • [26] Speech Emotion Recognition using Combination of Features
    Zhang, Qingli
    An, Ning
    Wang, Kunxia
    Ren, Fuji
    Li, Lian
    PROCEEDINGS OF THE 2013 FOURTH INTERNATIONAL CONFERENCE ON INTELLIGENT CONTROL AND INFORMATION PROCESSING (ICICIP), 2013, : 523 - 528
  • [27] Speech recognition using cepstral articulatory features
    Najnin, Shamima
    Banerjee, Bonny
    SPEECH COMMUNICATION, 2019, 107 : 26 - 37
  • [28] A method to compensate the influence of speech codec in speaker recognition
    Calvo de Lara, Jose R.
    Reyes Diaz, Flavio J.
    Hernandez Sierra, Gabriel
    Jimenez Alcazar, Orlando
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2018, 21 (04) : 975 - 985
  • [29] VISUAL SPEECH RECOGNITION FOR ISOLATED DIGITS USING DISCRETE COSINE TRANSFORM AND LOCAL BINARY PATTERN FEATURES
    Jain, Abhilash
    Rathna, G. N.
    2017 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP 2017), 2017, : 368 - 372
  • [30] Color pattern recognition using Mach-Zehnder nonzero order joint transform correlator with image encoding
    Lee, Chungcheng
    Hou, Yanan
    Ku, Kaining
    Wang, Chunmin
    Chang, Chiehpo
    Chen, Chulung
    IMECS 2007: INTERNATIONAL MULTICONFERENCE OF ENGINEERS AND COMPUTER SCIENTISTS, VOLS I AND II, 2007, : 1923 - +