Bispectral feature speech intelligibility assessment metric based on auditory model

被引:2
|
作者
Chen, Xiaomei [1 ]
Wang, Xiaowei [1 ]
Zhong, Bo [2 ]
Yang, Jiayan [3 ]
Shang, Yingying [3 ]
机构
[1] North China Elect Power Univ, Dept Elect & Elect Engn, Beijing 102206, Peoples R China
[2] Natl Inst Metrol, Div Mech & Acoust Metrol, Beijing 100029, Peoples R China
[3] Chinese Acad Med Sci, Peking Union Med Coll Hosp, Dept Otolaryngol, Beijing 100730, Peoples R China
来源
关键词
Speech intelligibility; Gammatone filter banks; Inner hair cell; Auditory model; Bispectrum; PREDICTION; INDEX; QUALITY; REVERBERANT;
D O I
10.1016/j.csl.2023.101492
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A bispectral feature based predictive speech intelligibility metric (GMBSIM) using a more refined functional auditory model of ear is proposed. In the auditory model of ear, Gammatone filter banks and Meddis inner hair cell auditory model is combined to simulate the ear function. With input speech signal divided into 32 auditory subbands, and each subband signal passed through the inner hair cell model, the bispectrum of each subband signal in time domain is estimated by frames. And then bispectral features are extracted and chosen to calculate the speech intelligi-bility. The proposed GMBSIM has relative low computational complexity by omitting the spec-trogram or neurogram image transformation. Considering the ear's perception and processing of speech signals makes the metric is advantageous to the classical metrics. And the last but not the least, the proposed GMBSIM metric is verified favorably across a range of conditions spanning reverberation, additive noise, and distortion such as jitter, which means it can be applied in most kinds of complex background noise environment.
引用
收藏
页数:17
相关论文
共 50 条
  • [21] Speech based transmission index for all: An intelligibility metric for variable hearing ability
    Mechergui, Nader
    Djaziri-Larbi, Sonia
    Jaidane, Meriem
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2017, 141 (03): : 1470 - 1480
  • [22] Predicting speech intelligibility based on a correlation metric in the envelope power spectrum domain
    Relano-Iborra, Helia
    May, Tobias
    Zaar, Johannes
    Scheidiger, Christoph
    Dau, Torsten
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2016, 140 (04): : 2670 - 2679
  • [23] Metric Learning Based Feature Representation with Gated Fusion Model for Speech Emotion Recognition
    Gao, Yuan
    Liu, JiaXing
    Wang, Longbiao
    Dang, Jianwu
    INTERSPEECH 2021, 2021, : 4503 - 4507
  • [24] PATHOLOGICAL SPEECH INTELLIGIBILITY ASSESSMENT BASED ON THE SHORT-TIME OBJECTIVE INTELLIGIBILITY MEASURE
    Janbakhshi, Parvaneh
    Kodrasi, Ina
    Bourlard, Herve
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6405 - 6409
  • [25] Single-ended Intelligibility Prediction of Noisy Speech Based on Auditory Features
    Alghamdi, Ahmed
    Chan, Wai-Yip
    2017 IEEE 30TH CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (CCECE), 2017,
  • [26] AN AUDITORY-BASED FEATURE FOR ROBUST SPEECH RECOGNITION
    Shao, Yang
    Jin, Zhaozhang
    Wang, DeLiang
    Srinivasan, Soundararajan
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4625 - +
  • [27] Auditory based feature vectors for speech recognition systems
    Abdulla, Waleed H.
    Advances in Communications and Software Technologies, 2002, : 231 - 236
  • [28] A series of SNR-based speech intelligibility models in the Auditory Modeling Toolbox
    Lavandier, Mathieu
    Vicente, Thibault
    Prud'homme, Luna
    ACTA ACUSTICA, 2022, 6
  • [29] Visual speech improves the intelligibility of time-expanded auditory speech
    Tanaka, Akihiro
    Sakamoto, Shuichi
    Tsumura, Komi
    Suzuki, Yoiti
    NEUROREPORT, 2009, 20 (05) : 473 - 477
  • [30] Auditory Model-Based Design and Optimization of Feature Vectors for Automatic Speech Recognition
    Chatterjee, Saikat
    Kleijn, W. Bastiaan
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (06): : 1813 - 1825