Exploring In-Context Learning of Textless Speech Language Model for Speech Classification Tasks

被引:0
|
作者
Chang, Kai-Wei [1 ]
Hsu, Ming-Hao [2 ]
Li, Shan-Wen [3 ]
Lee, Hung-yi [2 ]
机构
[1] Natl Taiwan Univ, Grad Inst Commun Engn, Taipei, Taiwan
[2] Natl Taiwan Univ, Dept Elect Engn, Taipei, Taiwan
[3] Meta AI, Menlo Pk, CA USA
来源
INTERSPEECH 2024 | 2024年
关键词
In-context learning; speech language model; prompt tuning; few-shot learning; speech classification;
D O I
10.21437/Interspeech.2024-1932
中图分类号
学科分类号
摘要
Ever since the development of GPT-3 in the natural language processing (NLP) field, in-context learning (ICL) has played an essential role in utilizing large language models (LLMs). By presenting the LM utterance-label demonstrations at the input, the LM can accomplish few-shot learning without relying on gradient descent or requiring explicit modification of its parameters. This enables the LM to perform various downstream tasks in a black-box manner. Despite the success of ICL in NLP, little work is exploring the possibility of ICL in speech processing. This study is the first work exploring ICL for speech classification tasks with textless speech LM. We first show that the current speech LM lacks the ICL capability. We then perform warmup training on the speech LM, equipping the LM with demonstration learning capability. This paper explores and proposes the first speech LM capable of performing unseen classification tasks in an ICL manner.
引用
收藏
页码:4139 / 4143
页数:5
相关论文
共 50 条
  • [21] Adaptive In-Context Learning with Large Language Models for Bundle
    Sun, Zhu
    Feng, Kaidong
    Yang, Jie
    Qu, Xinghua
    Fang, Hui
    Ong, Yew-Soon
    Liu, Wenyuan
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 966 - 976
  • [22] Learning to Retrieve In-Context Examples for Large Language Models
    Wang, Liang
    Yang, Nan
    Wei, Furu
    PROCEEDINGS OF THE 18TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 1752 - 1767
  • [23] Exploring Speech Emotion Recognition in Tribal Language with Deep Learning Techniques
    Nayak, Subrat Kumar
    Nayak, Ajit Kumar
    Mishra, Smitaprava
    Mohanty, Prithviraj
    Tripathy, Nrusingha
    Chaudhury, Kumar Surjeet
    INTERNATIONAL JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING SYSTEMS, 2025, 16 (01) : 53 - 64
  • [24] LEARNING AND REPRESENTATION IN SPEECH AND LANGUAGE
    KUHL, PK
    CURRENT OPINION IN NEUROBIOLOGY, 1994, 4 (06) : 812 - 822
  • [25] Structure of pauses in speech in the context of speaker verification and classification of speech type
    Magdalena Igras-Cybulska
    Bartosz Ziółko
    Piotr Żelasko
    Marcin Witkowski
    EURASIP Journal on Audio, Speech, and Music Processing, 2016
  • [26] Structure of pauses in speech in the context of speaker verification and classification of speech type
    Igras-Cybulska, Magdalena
    Ziolko, Bartosz
    Zelasko, Piotr
    Witkowski, Marcin
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2016,
  • [27] An Exploration of Prompt Tuning on Generative Spoken Language Model for Speech Processing Tasks
    Chang, Kai-Wei
    Tseng, Wei-Cheng
    Li, Shang-Wen
    Lee, Hung-yi
    INTERSPEECH 2022, 2022, : 5005 - 5009
  • [28] DOCUMENT-SPECIFIC CONTEXT PLSA LANGUAGE MODEL FOR SPEECH RECOGNITION
    Haidar, Md Akmal
    O'Shaughnessy, Douglas
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 5326 - 5330
  • [29] A Mechanism for Sample-Efficient In-Context Learning for Sparse Retrieval Tasks
    Abernethy, Jacob
    Agarwal, Alekh
    Marinov, Teodor V.
    Warmuth, Manfred K.
    INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, VOL 237, 2024, 237
  • [30] Language Models can Exploit Cross-Task In-context Learning for Data-Scarce Novel Tasks
    Chatterjee, Anwoy
    Tanwar, Eshaan
    Dutta, Subhabrata
    Chakraborty, Tanmoy
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 11568 - 11587