Enhancing Zero-shot Audio Classification using Sound Attribute Knowledge from Large Language Models

被引:0
|
作者
Xu, Xuenan [1 ]
Zhang, Pingyue [1 ]
Yang, Ming [2 ]
Zhang, Ji [2 ]
Wu, Mengyue [1 ]
机构
[1] Shanghai Jiao Tong Univ, MoE Key Lab Artificial Intelligence, X LANCE Lab, Shanghai, Peoples R China
[2] Alibaba Grp, Inst Intelligent Comp, Hangzhou, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
zero-shot learning; audio classification; sound attribute; large language model; audio-text contrastive learning;
D O I
10.21437/Interspeech.2024-1692
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Zero-shot audio classification aims to recognize and classify a sound class that the model has never seen during training. This paper presents a novel approach for zero-shot audio classification using automatically generated sound attribute descriptions. We propose a list of sound attributes and leverage large language model's domain knowledge to generate detailed attribute descriptions for each class. In contrast to previous works that primarily relied on class labels or simple descriptions, our method focuses on multi-dimensional innate auditory attributes, capturing different characteristics of sound classes. Additionally, we incorporate a contrastive learning approach to enhance zero-shot learning from textual labels. We validate the effectiveness of our method on VGGSound and AudioSet(1). Our results demonstrate a substantial improvement in zero-shot classification accuracy. Ablation results show robust performance enhancement, regardless of the model architecture.
引用
收藏
页码:4808 / 4812
页数:5
相关论文
共 50 条
  • [11] Zero-shot interpretable phenotyping of postpartum hemorrhage using large language models
    Alsentzer, Emily
    Rasmussen, Matthew J.
    Fontoura, Romy
    Cull, Alexis L.
    Beaulieu-Jones, Brett
    Gray, Kathryn J.
    Bates, David W.
    Kovacheva, Vesela P.
    NPJ DIGITAL MEDICINE, 2023, 6 (01)
  • [12] Zero-shot interpretable phenotyping of postpartum hemorrhage using large language models
    Emily Alsentzer
    Matthew J. Rasmussen
    Romy Fontoura
    Alexis L. Cull
    Brett Beaulieu-Jones
    Kathryn J. Gray
    David W. Bates
    Vesela P. Kovacheva
    npj Digital Medicine, 6
  • [13] Distilling knowledge from multiple foundation models for zero-shot image classification
    Yin, Siqi
    Jiang, Lifan
    PLOS ONE, 2024, 19 (09):
  • [14] Large Language Models are Zero-Shot Rankers for Recommender Systems
    Hou, Yupeng
    Zhang, Junjie
    Lin, Zihan
    Lu, Hongyu
    Xie, Ruobing
    McAuley, Julian
    Zhao, Wayne Xin
    ADVANCES IN INFORMATION RETRIEVAL, ECIR 2024, PT II, 2024, 14609 : 364 - 381
  • [15] Large Language Models Are Zero-Shot Time Series Forecasters
    Gruver, Nate
    Finzi, Marc
    Qiu, Shikai
    Wilson, Andrew Gordon
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [16] Examining Zero-Shot Vulnerability Repair with Large Language Models
    Pearce, Hammond
    Tan, Benjamin
    Ahmad, Baleegh
    Karri, Ramesh
    Dolan-Gavitt, Brendan
    2023 IEEE SYMPOSIUM ON SECURITY AND PRIVACY, SP, 2023, : 2339 - 2356
  • [17] Examining Zero-Shot Vulnerability Repair with Large Language Models
    Pearce, Hammond
    Tan, Benjamin
    Ahmad, Baleegh
    Karri, Ramesh
    Dolan-Gavitt, Brendan
    2023 IEEE SYMPOSIUM ON SECURITY AND PRIVACY, SP, 2023, : 2339 - 2356
  • [18] Revisiting Large Language Models as Zero-shot Relation Extractors
    Li, Guozheng
    Wang, Peng
    Ke, Wenjun
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 6877 - 6892
  • [19] Learning Autoencoder of Attribute Constraint for Zero-Shot Classification
    Wang, Kun
    Wu, Songsong
    Gao, Guangwei
    Zhou, Quan
    Jing, Xiao-Yuan
    PROCEEDINGS 2017 4TH IAPR ASIAN CONFERENCE ON PATTERN RECOGNITION (ACPR), 2017, : 605 - 610
  • [20] Learning Discriminative Instance Attribute for Zero-Shot Classification
    Wang, Lu
    Wu, Songsong
    Yu, Jun
    Jing, Xiao-Yuan
    PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON PROGRESS IN INFORMATICS AND COMPUTING (PIC), VOL 1, 2016, : 210 - 213