Enhancing Zero-shot Audio Classification using Sound Attribute Knowledge from Large Language Models

被引:0
|
作者
Xu, Xuenan [1 ]
Zhang, Pingyue [1 ]
Yang, Ming [2 ]
Zhang, Ji [2 ]
Wu, Mengyue [1 ]
机构
[1] Shanghai Jiao Tong Univ, MoE Key Lab Artificial Intelligence, X LANCE Lab, Shanghai, Peoples R China
[2] Alibaba Grp, Inst Intelligent Comp, Hangzhou, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
zero-shot learning; audio classification; sound attribute; large language model; audio-text contrastive learning;
D O I
10.21437/Interspeech.2024-1692
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Zero-shot audio classification aims to recognize and classify a sound class that the model has never seen during training. This paper presents a novel approach for zero-shot audio classification using automatically generated sound attribute descriptions. We propose a list of sound attributes and leverage large language model's domain knowledge to generate detailed attribute descriptions for each class. In contrast to previous works that primarily relied on class labels or simple descriptions, our method focuses on multi-dimensional innate auditory attributes, capturing different characteristics of sound classes. Additionally, we incorporate a contrastive learning approach to enhance zero-shot learning from textual labels. We validate the effectiveness of our method on VGGSound and AudioSet(1). Our results demonstrate a substantial improvement in zero-shot classification accuracy. Ablation results show robust performance enhancement, regardless of the model architecture.
引用
收藏
页码:4808 / 4812
页数:5
相关论文
共 50 条
  • [21] Zero-Shot Audio Classification Via Semantic Embeddings
    Xie, Huang
    Virtanen, Tuomas
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 1233 - 1242
  • [22] Label Propagation for Zero-shot Classification with Vision-Language Models
    Stojnic, Vladan
    Kalantidis, Yannis
    Tolias, Giorgos
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 23209 - 23218
  • [23] MEDAGENTS: Large Language Models as Collaborators for Zero-shot Medical Reasoning
    Tang, Xiangru
    Zou, Anni
    Zhang, Zhuosheng
    Li, Ziming
    Zhao, Yilun
    Zhang, Xingyao
    Cohen, Arman
    Gerstein, Mark
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 599 - 621
  • [24] ZVQAF: Zero-shot visual question answering with feedback from large language models
    Liu, Cheng
    Wang, Chao
    Peng, Yan
    Li, Zhixu
    NEUROCOMPUTING, 2024, 580
  • [25] Zero-shot Bilingual App Reviews Mining with Large Language Models
    Wei, Jialiang
    Courbis, Anne-Lise
    Lambolais, Thomas
    Xu, Binbin
    Bernard, Pierre Louis
    Dray, Gerard
    2023 IEEE 35TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI, 2023, : 898 - 904
  • [26] Language Models as Zero-Shot Trajectory Generators
    Kwon, Teyun
    Di Palo, Norman
    Johns, Edward
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (07): : 6728 - 6735
  • [27] Zero-Shot Image Classification Method Based on Attribute Weighting
    Chen, Wenbai
    Chen, Xiangfeng
    Liu, Chang
    Wu, Hao
    Li, Denghua
    PROCEEDINGS OF 2019 6TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (CCIS), 2019, : 84 - 88
  • [28] Enhancing Classification in Zero-Shot Learning with the Aid of Perceptron
    Zengin, Hilal
    Ismailoglu, Firat
    2022 30TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU, 2022,
  • [29] Zero-Shot Federated Learning with New Classes for Audio Classification
    Gudur, Gautham Krishna
    Perepu, Satheesh Kumar
    INTERSPEECH 2021, 2021, : 1579 - 1583
  • [30] ZERO-SHOT AUDIO CLASSIFICATION BASED ON CLASS LABEL EMBEDDINGS
    Xie, Huang
    Virtanen, Tuomas
    2019 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2019, : 264 - 267