Sound Retrieval and Ranking Using Sparse Auditory Representations

被引:37
|
作者
Lyon, Richard F. [1 ]
Rehn, Martin [1 ]
Bengio, Samy [1 ]
Walters, Thomas C. [1 ]
Chechik, Gal [1 ]
机构
[1] Google, Mountain View, CA 94043 USA
关键词
MODEL;
D O I
10.1162/NECO_a_00011
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
To create systems that understand the sounds that humans are exposed to in everyday life, we need to represent sounds with features that can discriminate among many different sound classes. Here, we use a sound-ranking framework to quantitatively evaluate such representations in a large-scale task. We have adapted a machine-vision method, the passive-aggressive model for image retrieval (PAMIR), which efficiently learns a linear mapping from a very large sparse feature space to a large query-term space. Using this approach, we compare different auditory front ends and different ways of extracting sparse features from high-dimensional auditory images. We tested auditory models that use an adaptive pole-zero filter cascade (PZFC) auditory filter bank and sparse-code feature extraction from stabilized auditory images with multiple vector quantizers. In addition to auditory image models, we compare a family of more conventional mel-frequency cepstral coefficient (MFCC) front ends. The experimental results show a significant advantage for the auditory models over vector-quantized MFCCs. When thousands of sound files with a query vocabulary of thousands of words were ranked, the best precision at top-1 was 73% and the average precision was 35%, reflecting a 18% improvement over the best competing MFCC front end.
引用
收藏
页码:2390 / 2416
页数:27
相关论文
共 50 条
  • [21] Sparse audio representations using the MCLT
    Davies, ME
    Daudet, L
    SIGNAL PROCESSING, 2006, 86 (03) : 457 - 470
  • [22] Image understanding using sparse representations
    Thiagarajan, J.J., 1600, Morgan and Claypool Publishers (15):
  • [23] Efficient Inverted Indexes for Approximate Retrieval over Learned Sparse Representations
    Bruch, Sebastian
    Nardini, Franco Maria
    Rulli, Cosimo
    Venturini, Rossano
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 152 - 162
  • [24] Deep Hashing for Speaker Identification and Retrieval Based on Auditory Sparse Representation
    Tran, Dung Kim
    Akagi, Masato
    Unoki, Masashi
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 937 - 943
  • [25] Image understanding using sparse representations
    Thiagarajan, Jayaraman J.
    Ramamurthy, Karthikeyan Natesan
    Turaga, Pavan
    Spanias, Andreas
    Synthesis Lectures on Image, Video, and Multimedia Processing, 2014, 7 (01): : 1 - 120
  • [26] Inpainting and Zooming Using Sparse Representations
    Fadili, M. J.
    Starck, J. -L.
    Murtagh, F.
    COMPUTER JOURNAL, 2009, 52 (01): : 64 - 79
  • [27] Image Denoising Using Sparse Representations
    Valiollahzadeh, SeyyedMajid
    Firouzi, Hamed
    Babaie-Zadeh, Massoud
    Jutten, Christian
    INDEPENDENT COMPONENT ANALYSIS AND SIGNAL SEPARATION, PROCEEDINGS, 2009, 5441 : 557 - +
  • [28] Perceptual organization of sound determines the relevance of neural representations in the auditory hierarchy
    Atienza, M
    Cantero, JL
    Grau, C
    Gomez, C
    Dominguez-Marin, E
    Escera, C
    INTERNATIONAL JOURNAL OF PSYCHOPHYSIOLOGY, 2002, 45 (1-2) : 121 - 121
  • [29] Spatial representations of temporal and spectral sound cues in human auditory cortex
    Herdener, Marcus
    Esposito, Fabrizio
    Scheffler, Klaus
    Schneider, Peter
    Logothetis, Nikos K.
    Uludag, Kamil
    Kayser, Christoph
    CORTEX, 2013, 49 (10) : 2822 - 2833
  • [30] Image retrieval using multiple evidence ranking
    Coelho, TAS
    Calado, PP
    Souza, LV
    Ribeiro-Neto, B
    Muntz, R
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2004, 16 (04) : 408 - 417