Sound Retrieval and Ranking Using Sparse Auditory Representations

被引：37

作者：

Lyon, Richard F. ^{[1
]}

Rehn, Martin ^{[1
]}

Bengio, Samy ^{[1
]}

Walters, Thomas C. ^{[1
]}

Chechik, Gal ^{[1
]}

机构：

[1] Google, Mountain View, CA 94043 USA

来源：

NEURAL COMPUTATION | 2010年 / 22卷 / 09期

关键词：

MODEL;

D O I：

10.1162/NECO_a_00011

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

To create systems that understand the sounds that humans are exposed to in everyday life, we need to represent sounds with features that can discriminate among many different sound classes. Here, we use a sound-ranking framework to quantitatively evaluate such representations in a large-scale task. We have adapted a machine-vision method, the passive-aggressive model for image retrieval (PAMIR), which efficiently learns a linear mapping from a very large sparse feature space to a large query-term space. Using this approach, we compare different auditory front ends and different ways of extracting sparse features from high-dimensional auditory images. We tested auditory models that use an adaptive pole-zero filter cascade (PZFC) auditory filter bank and sparse-code feature extraction from stabilized auditory images with multiple vector quantizers. In addition to auditory image models, we compare a family of more conventional mel-frequency cepstral coefficient (MFCC) front ends. The experimental results show a significant advantage for the auditory models over vector-quantized MFCCs. When thousands of sound files with a query vocabulary of thousands of words were ranked, the best precision at top-1 was 73% and the average precision was 35%, reflecting a 18% improvement over the best competing MFCC front end.

引用

页码：2390 / 2416

页数：27

共 50 条

[1] Sparse Contour Representations of Sound
Lim, Yoonseob
Shinn-Cunningham, Barbara
Gardner, Timothy J.
IEEE SIGNAL PROCESSING LETTERS, 2012, 19 (10) : 684 - 687
[2] Sound level representations in the human auditory pathway investigated using fMRI
Sigalovsky, I
Hawley, ML
Harms, MP
Melcher, JR
NEUROIMAGE, 2001, 13 (06) : S939 - S939
[3] Sparse, Dense, and Attentional Representations for Text Retrieval
Luan, Yi
Eisenstein, Jacob
Toutanova, Kristina
Collins, Michael
TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2021, 9 : 329 - 345
[4] Concept Based Representations for Ranking in Geographic Information Retrieval
Carrillo, Maya
Villatoro-Tello, Esau
Lopez-Lopez, Aurelio
Eliasmith, Chris
Villasenor-Pineda, Luis
Montes-y-Gomez, Manuel
ADVANCES IN NATURAL LANGUAGE PROCESSING, 2010, 6233 : 85 - +
[5] Mixed Representations of Sound and Action in the Auditory Midbrain
Quass, Gunnar L.
Rogalla, Meike M.
Ford, Alexander N.
Apostolides, Pierre F.
JOURNAL OF NEUROSCIENCE, 2024, 44 (30):
[6] ENVIRONMENTAL SOUND CLASSIFICATION USING CENTRAL AUDITORY REPRESENTATIONS AND DEEP NEURAL NETWORKS
Chen, Kean
Yang, Lixue
Sang, Zhiming
PROCEEDINGS OF THE 23RD INTERNATIONAL CONGRESS ON SOUND AND VIBRATION: FROM ANCIENT TO MODERN ACOUSTICS, 2016,
[7] RECOGNITION AND RETRIEVAL OF SOUND EVENTS USING SPARSE CODING CONVOLUTIONAL NEURAL NETWORK
Wang, Chien-Yao
Santoso, Andri
Mathulaprangsan, Seksan
Chiang, Chin-Chin
Wu, Chung-Hsien
Wang, Jia-Ching
2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 589 - 594
[8] Multiplexed and Robust Representations of Sound Features in Auditory Cortex
Walker, Kerry M. M.
Bizley, Jennifer K.
King, Andrew J.
Schnupp, Jan W. H.
JOURNAL OF NEUROSCIENCE, 2011, 31 (41): : 14565 - 14576
[9] Massive perturbation of sound representations by anesthesia in the auditory brainstem
Gosselin, Etienne
Bagur, Sophie
Bathellier, Brice
SCIENCE ADVANCES, 2024, 10 (42):
[10] Emergence of abstract sound representations in the ascending auditory system
Harpaz, Mor
Jankowski, Maciej M.
Khouri, Leila
Nelken, Israel
PROGRESS IN NEUROBIOLOGY, 2021, 202

← 1 2 3 4 5 →