Sound Retrieval and Ranking Using Sparse Auditory Representations

被引：37

作者：

Lyon, Richard F. ^{[1
]}

Rehn, Martin ^{[1
]}

Bengio, Samy ^{[1
]}

Walters, Thomas C. ^{[1
]}

Chechik, Gal ^{[1
]}

机构：

[1] Google, Mountain View, CA 94043 USA

来源：

NEURAL COMPUTATION | 2010年 / 22卷 / 09期

关键词：

MODEL;

D O I：

10.1162/NECO_a_00011

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

To create systems that understand the sounds that humans are exposed to in everyday life, we need to represent sounds with features that can discriminate among many different sound classes. Here, we use a sound-ranking framework to quantitatively evaluate such representations in a large-scale task. We have adapted a machine-vision method, the passive-aggressive model for image retrieval (PAMIR), which efficiently learns a linear mapping from a very large sparse feature space to a large query-term space. Using this approach, we compare different auditory front ends and different ways of extracting sparse features from high-dimensional auditory images. We tested auditory models that use an adaptive pole-zero filter cascade (PZFC) auditory filter bank and sparse-code feature extraction from stabilized auditory images with multiple vector quantizers. In addition to auditory image models, we compare a family of more conventional mel-frequency cepstral coefficient (MFCC) front ends. The experimental results show a significant advantage for the auditory models over vector-quantized MFCCs. When thousands of sound files with a query vocabulary of thousands of words were ranked, the best precision at top-1 was 73% and the average precision was 35%, reflecting a 18% improvement over the best competing MFCC front end.

引用

页码：2390 / 2416

页数：27

共 50 条

[21] Sparse audio representations using the MCLT
Davies, ME
Daudet, L
SIGNAL PROCESSING, 2006, 86 (03) : 457 - 470
[22] Image understanding using sparse representations
Thiagarajan, J.J., 1600, Morgan and Claypool Publishers (15):
[23] Efficient Inverted Indexes for Approximate Retrieval over Learned Sparse Representations
Bruch, Sebastian
Nardini, Franco Maria
Rulli, Cosimo
Venturini, Rossano
PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 152 - 162
[24] Deep Hashing for Speaker Identification and Retrieval Based on Auditory Sparse Representation
Tran, Dung Kim
Akagi, Masato
Unoki, Masashi
PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 937 - 943
[25] Image understanding using sparse representations
Thiagarajan, Jayaraman J.
Ramamurthy, Karthikeyan Natesan
Turaga, Pavan
Spanias, Andreas
Synthesis Lectures on Image, Video, and Multimedia Processing, 2014, 7 (01): : 1 - 120
[26] Inpainting and Zooming Using Sparse Representations
Fadili, M. J.
Starck, J. -L.
Murtagh, F.
COMPUTER JOURNAL, 2009, 52 (01): : 64 - 79
[27] Image Denoising Using Sparse Representations
Valiollahzadeh, SeyyedMajid
Firouzi, Hamed
Babaie-Zadeh, Massoud
Jutten, Christian
INDEPENDENT COMPONENT ANALYSIS AND SIGNAL SEPARATION, PROCEEDINGS, 2009, 5441 : 557 - +
[28] Perceptual organization of sound determines the relevance of neural representations in the auditory hierarchy
Atienza, M
Cantero, JL
Grau, C
Gomez, C
Dominguez-Marin, E
Escera, C
INTERNATIONAL JOURNAL OF PSYCHOPHYSIOLOGY, 2002, 45 (1-2) : 121 - 121
[29] Spatial representations of temporal and spectral sound cues in human auditory cortex
Herdener, Marcus
Esposito, Fabrizio
Scheffler, Klaus
Schneider, Peter
Logothetis, Nikos K.
Uludag, Kamil
Kayser, Christoph
CORTEX, 2013, 49 (10) : 2822 - 2833
[30] Image retrieval using multiple evidence ranking
Coelho, TAS
Calado, PP
Souza, LV
Ribeiro-Neto, B
Muntz, R
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2004, 16 (04) : 408 - 417

← 1 2 3 4 5 →