Enhancing Zero-shot Audio Classification using Sound Attribute Knowledge from Large Language Models

被引:0
|
作者
Xu, Xuenan [1 ]
Zhang, Pingyue [1 ]
Yang, Ming [2 ]
Zhang, Ji [2 ]
Wu, Mengyue [1 ]
机构
[1] Shanghai Jiao Tong Univ, MoE Key Lab Artificial Intelligence, X LANCE Lab, Shanghai, Peoples R China
[2] Alibaba Grp, Inst Intelligent Comp, Hangzhou, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
zero-shot learning; audio classification; sound attribute; large language model; audio-text contrastive learning;
D O I
10.21437/Interspeech.2024-1692
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Zero-shot audio classification aims to recognize and classify a sound class that the model has never seen during training. This paper presents a novel approach for zero-shot audio classification using automatically generated sound attribute descriptions. We propose a list of sound attributes and leverage large language model's domain knowledge to generate detailed attribute descriptions for each class. In contrast to previous works that primarily relied on class labels or simple descriptions, our method focuses on multi-dimensional innate auditory attributes, capturing different characteristics of sound classes. Additionally, we incorporate a contrastive learning approach to enhance zero-shot learning from textual labels. We validate the effectiveness of our method on VGGSound and AudioSet(1). Our results demonstrate a substantial improvement in zero-shot classification accuracy. Ablation results show robust performance enhancement, regardless of the model architecture.
引用
收藏
页码:4808 / 4812
页数:5
相关论文
共 50 条
  • [1] Zero-Shot Classification of Art With Large Language Models
    Tojima, Tatsuya
    Yoshida, Mitsuo
    IEEE ACCESS, 2025, 13 : 17426 - 17439
  • [2] Large Language Models are Zero-Shot Reasoners
    Kojima, Takeshi
    Gu, Shixiang Shane
    Reid, Machel
    Matsuo, Yutaka
    Iwasawa, Yusuke
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [3] Improving Zero-Shot Stance Detection by Infusing Knowledge from Large Language Models
    Guo, Mengzhuo
    Jiang, Xiaorui
    Liao, Yong
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT XIII, ICIC 2024, 2024, 14874 : 121 - 132
  • [4] Zero-Shot Audio Classification using Image Embeddings
    Dogan, Duygu
    Xie, Huang
    Heittola, Toni
    Virtanen, Tuomas
    2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 1 - 5
  • [5] Large Language Models as Zero-Shot Conversational Recommenders
    He, Zhankui
    Xie, Zhouhang
    Jha, Rahul
    Steck, Harald
    Liang, Dawen
    Feng, Yesu
    Majumder, Bodhisattwa Prasad
    Kallus, Nathan
    McAuley, Julian
    PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 720 - 730
  • [6] ATTRIBUTE DRIVEN ZERO-SHOT CLASSIFICATION AND SEGMENTATION
    Yang, Shu
    Shi, Yemin
    Wang, Yaowei
    Wang, Jing
    Fei, Zesong
    2018 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO WORKSHOPS (ICMEW 2018), 2018,
  • [7] Attribute relation learning for zero-shot classification
    Liu, Mingxia
    Zhang, Daoqiang
    Chen, Songcan
    NEUROCOMPUTING, 2014, 139 : 34 - 46
  • [9] Zero-Shot Image Classification Based on Attribute
    Zhang, Wei
    Chen, Wenbai
    Chen, Xiangfeng
    Han, Hu
    2017 INTERNATIONAL CONFERENCE ON SECURITY, PATTERN ANALYSIS, AND CYBERNETICS (SPAC), 2017, : 25 - 30
  • [10] Enhancing text-based knowledge graph completion with zero-shot large language models: A focus on semantic enhancement
    Yang, Rui
    Zhu, Jiahao
    Man, Jianping
    Fang, Li
    Zhou, Yi
    KNOWLEDGE-BASED SYSTEMS, 2024, 300