Enhancing Zero-shot Audio Classification using Sound Attribute Knowledge from Large Language Models

被引：0

作者：

Xu, Xuenan ^{[1
]}

Zhang, Pingyue ^{[1
]}

Yang, Ming ^{[2
]}

Zhang, Ji ^{[2
]}

Wu, Mengyue ^{[1
]}

机构：

[1] Shanghai Jiao Tong Univ, MoE Key Lab Artificial Intelligence, X LANCE Lab, Shanghai, Peoples R China

[2] Alibaba Grp, Inst Intelligent Comp, Hangzhou, Peoples R China

来源：

INTERSPEECH 2024 | 2024年

基金：

中国国家自然科学基金;

关键词：

zero-shot learning; audio classification; sound attribute; large language model; audio-text contrastive learning;

D O I：

10.21437/Interspeech.2024-1692

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Zero-shot audio classification aims to recognize and classify a sound class that the model has never seen during training. This paper presents a novel approach for zero-shot audio classification using automatically generated sound attribute descriptions. We propose a list of sound attributes and leverage large language model's domain knowledge to generate detailed attribute descriptions for each class. In contrast to previous works that primarily relied on class labels or simple descriptions, our method focuses on multi-dimensional innate auditory attributes, capturing different characteristics of sound classes. Additionally, we incorporate a contrastive learning approach to enhance zero-shot learning from textual labels. We validate the effectiveness of our method on VGGSound and AudioSet(1). Our results demonstrate a substantial improvement in zero-shot classification accuracy. Ablation results show robust performance enhancement, regardless of the model architecture.

引用

页码：4808 / 4812

页数：5

共 50 条

[1] Zero-Shot Classification of Art With Large Language Models
Tojima, Tatsuya
Yoshida, Mitsuo
IEEE ACCESS, 2025, 13 : 17426 - 17439
[2] Large Language Models are Zero-Shot Reasoners
Kojima, Takeshi
Gu, Shixiang Shane
Reid, Machel
Matsuo, Yutaka
Iwasawa, Yusuke
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[3] Improving Zero-Shot Stance Detection by Infusing Knowledge from Large Language Models
Guo, Mengzhuo
Jiang, Xiaorui
Liao, Yong
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT XIII, ICIC 2024, 2024, 14874 : 121 - 132
[4] Zero-Shot Audio Classification using Image Embeddings
Dogan, Duygu
Xie, Huang
Heittola, Toni
Virtanen, Tuomas
2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 1 - 5
[5] Large Language Models as Zero-Shot Conversational Recommenders
He, Zhankui
Xie, Zhouhang
Jha, Rahul
Steck, Harald
Liang, Dawen
Feng, Yesu
Majumder, Bodhisattwa Prasad
Kallus, Nathan
McAuley, Julian
PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 720 - 730
[6] ATTRIBUTE DRIVEN ZERO-SHOT CLASSIFICATION AND SEGMENTATION
Yang, Shu
Shi, Yemin
Wang, Yaowei
Wang, Jing
Fei, Zesong
2018 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO WORKSHOPS (ICMEW 2018), 2018,
[7] Attribute relation learning for zero-shot classification
Liu, Mingxia
Zhang, Daoqiang
Chen, Songcan
NEUROCOMPUTING, 2014, 139 : 34 - 46
[8] Enhancing Melanoma Diagnosis: Integration of Zero-Shot and Few-Shot Learning With Large Language Models
Nagaoka, Takashi
SKIN RESEARCH AND TECHNOLOGY, 2024, 30 (09)
[9] Zero-Shot Image Classification Based on Attribute
Zhang, Wei
Chen, Wenbai
Chen, Xiangfeng
Han, Hu
2017 INTERNATIONAL CONFERENCE ON SECURITY, PATTERN ANALYSIS, AND CYBERNETICS (SPAC), 2017, : 25 - 30
[10] Enhancing text-based knowledge graph completion with zero-shot large language models: A focus on semantic enhancement
Yang, Rui
Zhu, Jiahao
Man, Jianping
Fang, Li
Zhou, Yi
KNOWLEDGE-BASED SYSTEMS, 2024, 300

← 1 2 3 4 5 →