Sound Retrieval and Ranking Using Sparse Auditory Representations

被引:37
|
作者
Lyon, Richard F. [1 ]
Rehn, Martin [1 ]
Bengio, Samy [1 ]
Walters, Thomas C. [1 ]
Chechik, Gal [1 ]
机构
[1] Google, Mountain View, CA 94043 USA
关键词
MODEL;
D O I
10.1162/NECO_a_00011
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
To create systems that understand the sounds that humans are exposed to in everyday life, we need to represent sounds with features that can discriminate among many different sound classes. Here, we use a sound-ranking framework to quantitatively evaluate such representations in a large-scale task. We have adapted a machine-vision method, the passive-aggressive model for image retrieval (PAMIR), which efficiently learns a linear mapping from a very large sparse feature space to a large query-term space. Using this approach, we compare different auditory front ends and different ways of extracting sparse features from high-dimensional auditory images. We tested auditory models that use an adaptive pole-zero filter cascade (PZFC) auditory filter bank and sparse-code feature extraction from stabilized auditory images with multiple vector quantizers. In addition to auditory image models, we compare a family of more conventional mel-frequency cepstral coefficient (MFCC) front ends. The experimental results show a significant advantage for the auditory models over vector-quantized MFCCs. When thousands of sound files with a query vocabulary of thousands of words were ranked, the best precision at top-1 was 73% and the average precision was 35%, reflecting a 18% improvement over the best competing MFCC front end.
引用
收藏
页码:2390 / 2416
页数:27
相关论文
共 50 条
  • [1] Sparse Contour Representations of Sound
    Lim, Yoonseob
    Shinn-Cunningham, Barbara
    Gardner, Timothy J.
    IEEE SIGNAL PROCESSING LETTERS, 2012, 19 (10) : 684 - 687
  • [2] Sound level representations in the human auditory pathway investigated using fMRI
    Sigalovsky, I
    Hawley, ML
    Harms, MP
    Melcher, JR
    NEUROIMAGE, 2001, 13 (06) : S939 - S939
  • [3] Sparse, Dense, and Attentional Representations for Text Retrieval
    Luan, Yi
    Eisenstein, Jacob
    Toutanova, Kristina
    Collins, Michael
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2021, 9 : 329 - 345
  • [4] Concept Based Representations for Ranking in Geographic Information Retrieval
    Carrillo, Maya
    Villatoro-Tello, Esau
    Lopez-Lopez, Aurelio
    Eliasmith, Chris
    Villasenor-Pineda, Luis
    Montes-y-Gomez, Manuel
    ADVANCES IN NATURAL LANGUAGE PROCESSING, 2010, 6233 : 85 - +
  • [5] Mixed Representations of Sound and Action in the Auditory Midbrain
    Quass, Gunnar L.
    Rogalla, Meike M.
    Ford, Alexander N.
    Apostolides, Pierre F.
    JOURNAL OF NEUROSCIENCE, 2024, 44 (30):
  • [6] ENVIRONMENTAL SOUND CLASSIFICATION USING CENTRAL AUDITORY REPRESENTATIONS AND DEEP NEURAL NETWORKS
    Chen, Kean
    Yang, Lixue
    Sang, Zhiming
    PROCEEDINGS OF THE 23RD INTERNATIONAL CONGRESS ON SOUND AND VIBRATION: FROM ANCIENT TO MODERN ACOUSTICS, 2016,
  • [7] RECOGNITION AND RETRIEVAL OF SOUND EVENTS USING SPARSE CODING CONVOLUTIONAL NEURAL NETWORK
    Wang, Chien-Yao
    Santoso, Andri
    Mathulaprangsan, Seksan
    Chiang, Chin-Chin
    Wu, Chung-Hsien
    Wang, Jia-Ching
    2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 589 - 594
  • [8] Multiplexed and Robust Representations of Sound Features in Auditory Cortex
    Walker, Kerry M. M.
    Bizley, Jennifer K.
    King, Andrew J.
    Schnupp, Jan W. H.
    JOURNAL OF NEUROSCIENCE, 2011, 31 (41): : 14565 - 14576
  • [9] Massive perturbation of sound representations by anesthesia in the auditory brainstem
    Gosselin, Etienne
    Bagur, Sophie
    Bathellier, Brice
    SCIENCE ADVANCES, 2024, 10 (42):
  • [10] Emergence of abstract sound representations in the ascending auditory system
    Harpaz, Mor
    Jankowski, Maciej M.
    Khouri, Leila
    Nelken, Israel
    PROGRESS IN NEUROBIOLOGY, 2021, 202