Time-frequency representations in speech perception

被引：12

作者：

Gomez-Vilda, Pedro ^{[1
]}

Ferrandez-Vicente, Jose M. ^{[2
]}

Rodellar-Biarge, Victoria ^{[1
]}

Fernandez-Baillo, Roberto ^{[1
]}

机构：

[1] Univ Politecn Madrid, Fac Informat, E-28660 Madrid, Spain

[2] Univ Politecn Cartagena, Cartagena 30202, Spain

来源：

NEUROCOMPUTING | 2009年 / 72卷 / 4-6期

关键词：

Bio-inspired speech processing; Speech perception; Acoustic-phonetics; Phonetic boundaries and classes; Minimal semantic units; ORGANIZATION; INTEGRATION; DOMAIN;

D O I：

10.1016/j.neucom.2008.04.056

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Nowadays applications demand a comprehensive view of voice and speech perception to build more complex and competitive procedures amenable of extracting as much knowledge from sound-based human communication as possible. Many knowledge-extraction tasks from speech and voice may share signal treatment procedures which can be devised under the point of view of bio-inspiration. The present paper examines a hierarchy of sound processing functionalities at the auditory and perceptual levels on the Auditory Neural pathways which can be translated into bio-inspired speech-processing techniques, their fundamental characteristics being analyzed in relation with current tendencies in cognitive audio processing. The pathways linking the peripheral auditory system (cochlear complex) with the brain cortex are briefly examined, with special attention to the study of neuronal structures showing specific capabilities under the point of view of formant analysis and the build-up of a semantic hierarchy from the time-frequency structure of speech to explore their capability of conveying semantics to speech processing and understanding from the minimal acoustic clues with elementary meaning or "sematoms". The replication of known biological functionality by algorithmic methods through bio-inspiration is a secondary aim of the research. Examples extracted from speech processing tasks in the domain of acoustic-phonetics are presented. These may find applicability in speech recognition, speaker's characterization and biometry, emotion detection, and others related. (C) 2008 Elsevier B.V. All rights reserved.

引用

页码：820 / 830

页数：11

共 50 条

[21] The Effect of Partial Time-Frequency Masking of the Direct Sound on the Perception of Reverberant Speech
Madmoni, Lior
Tibor, Shir
Nelken, Israel
Rafaely, Boaz
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 2037 - 2047
[22] PHASE RECONSTRUCTION WITH LEARNED TIME-FREQUENCY REPRESENTATIONS FOR SINGLE-CHANNEL SPEECH SEPARATION
Wichern, Gordon
Le Roux, Jonathan
2018 16TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2018, : 396 - 400
[23] Overlapped Speech Detection Using AM-FM Based Time-Frequency Representations
Baghel, Shikha
Prasanna, S. R. M.
Guha, Prithwijit
SPEECH AND COMPUTER, SPECOM 2022, 2022, 13721 : 33 - 43
[24] THE RELATIONSHIP BETWEEN INSTANTANEOUS FREQUENCY AND TIME-FREQUENCY REPRESENTATIONS
LOVELL, BC
WILLIAMSON, RC
BOASHASH, B
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1993, 41 (03) : 1458 - 1461
[25] On time-frequency masking in voiced speech
Skoglund, J
Kleijn, WB
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (04): : 361 - 369
[26] Time-frequency methods for enhancing speech
Kenny, OP
Nelson, DJ
ADVANCED SIGNAL PROCESSING: ALGORITHMS, ARCHITECTURES, AND IMPLEMENTATIONS VII, 1997, 3162 : 48 - 57
[27] Improve Speech Enhancement using Perception-High-Related Time-Frequency Loss
Zhao, Ding
Zhang, Zhan
Yu, Bin
Wang, Yuehai
INTERSPEECH 2022, 2022, : 5483 - 5487
[28] Adapted and Adaptive Linear Time-Frequency Representations
Balazs, Peter
Doerfler, Monika
Kowalski, Matthieu
Torresani, Bruno
IEEE SIGNAL PROCESSING MAGAZINE, 2013, 30 (06) : 20 - 31
[29] Localized subclasses of quadratic time-frequency representations
PapandreouSuppappola, A
Murray, RL
BoudreauxBartels, GF
1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 2041 - 2044
[30] Time-Frequency Representations: Spectrogram, Cochleogram and Correlogram
Chaurasiya, Himanshu
INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND DATA SCIENCE, 2020, 167 : 1901 - 1910

← 1 2 3 4 5 →