Enhancing Zero-shot Audio Classification using Sound Attribute Knowledge from Large Language Models

被引:0
|
作者
Xu, Xuenan [1 ]
Zhang, Pingyue [1 ]
Yang, Ming [2 ]
Zhang, Ji [2 ]
Wu, Mengyue [1 ]
机构
[1] Shanghai Jiao Tong Univ, MoE Key Lab Artificial Intelligence, X LANCE Lab, Shanghai, Peoples R China
[2] Alibaba Grp, Inst Intelligent Comp, Hangzhou, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
zero-shot learning; audio classification; sound attribute; large language model; audio-text contrastive learning;
D O I
10.21437/Interspeech.2024-1692
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Zero-shot audio classification aims to recognize and classify a sound class that the model has never seen during training. This paper presents a novel approach for zero-shot audio classification using automatically generated sound attribute descriptions. We propose a list of sound attributes and leverage large language model's domain knowledge to generate detailed attribute descriptions for each class. In contrast to previous works that primarily relied on class labels or simple descriptions, our method focuses on multi-dimensional innate auditory attributes, capturing different characteristics of sound classes. Additionally, we incorporate a contrastive learning approach to enhance zero-shot learning from textual labels. We validate the effectiveness of our method on VGGSound and AudioSet(1). Our results demonstrate a substantial improvement in zero-shot classification accuracy. Ablation results show robust performance enhancement, regardless of the model architecture.
引用
收藏
页码:4808 / 4812
页数:5
相关论文
共 50 条
  • [41] Micro-Knowledge Embedding for Zero-shot Classification
    Li, Houjun
    Wang, Fang
    Liu, Jingxian
    Huang, Jianhua
    Zhang, Ting
    Yang, Shuhong
    COMPUTERS & ELECTRICAL ENGINEERING, 2022, 101
  • [42] Large Language Models Are Zero-Shot Fuzzers: Fuzzing Deep-Learning Libraries via Large Language Models
    Deng, Yinlin
    Xia, Chunqiu Steven
    Peng, Haoran
    Yang, Chenyuan
    Zhan, Lingming
    PROCEEDINGS OF THE 32ND ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, ISSTA 2023, 2023, : 423 - 435
  • [43] Attribute-Based Classification for Zero-Shot Visual Object Categorization
    Lampert, Christoph H.
    Nickisch, Hannes
    Harmeling, Stefan
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2014, 36 (03) : 453 - 465
  • [44] Attribute-Based Zero-Shot Learning for Encrypted Traffic Classification
    Hu, Ying
    Cheng, Guang
    Chen, Wenchao
    Jiang, Bomiao
    IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2022, 19 (04): : 4583 - 4599
  • [45] Zero-shot Classification using Hyperdimensional Computing
    Ruffino, Samuele
    Karunaratne, Geethan
    Hersche, Michael
    Benini, Luca
    Abu Sebastian
    Rahimi, Abbas
    2024 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, DATE, 2024,
  • [46] MULTI-LABEL ZERO-SHOT AUDIO CLASSIFICATION WITH TEMPORAL ATTENTION
    Dogan, Duygu
    Xie, Huang
    Heittola, Toni
    Virtanen, Tuomas
    2024 18TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT, IWAENC 2024, 2024, : 250 - 254
  • [47] Hybrid Feature Approach for Enhancing Zero-Shot Image Classification
    Khanam, Shaista
    Sonar, Poonam N.
    ARTIFICIAL INTELLIGENCE AND KNOWLEDGE PROCESSING, AIKP 2024, 2025, 2228 : 239 - 251
  • [48] Improved Zero-Shot Audio Tagging & Classification with Patchout Spectrogram Transformers
    Primus, Paul
    Widmer, Gerhard
    2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 409 - 413
  • [49] MULTI-LABEL AUDIO CLASSIFICATION WITH A NOISY ZERO-SHOT TEACHER
    Braun, Sebastian
    Gamper, Hannes
    2024 18TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT, IWAENC 2024, 2024, : 240 - 244
  • [50] Harnessing large language models' zero-shot and few-shot learning capabilities for regulatory research
    Meshkin, Hamed
    Zirkle, Joel
    Arabidarrehdor, Ghazal
    Chaturbedi, Anik
    Chakravartula, Shilpa
    Mann, John
    Thrasher, Bradlee
    Li, Zhihua
    BRIEFINGS IN BIOINFORMATICS, 2024, 25 (05)