COSMIC: Data Efficient Instruction-tuning For Speech In-Context Learning

被引:0
|
作者
Pan, Jing [1 ]
Wu, Jian [1 ]
Gaur, Yashesh [1 ]
Sivasankaran, Sunit [1 ]
Chen, Zhuo [1 ]
Liu, Shujie [1 ]
Li, Jinyu [1 ]
机构
[1] Microsoft, One Microsoft Way, Redmond, WA 98052 USA
来源
关键词
multi modality; large language model; speechin-context learning; instruction tuning;
D O I
10.21437/Interspeech.2024-1346
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a cost-effective method to integrate speech into a large language model (LLM), resulting in a Contextual Speech Model with Instruction-following/in-context-learning Capabilities (COSMIC) multi-modal LLM. Using GPT-3.5, we generate Speech Comprehension Test Question-Answer (SQA) pairs from speech transcriptions for supervised instruction tuning. With under 30 million trainable parameters and only 450 hours of English speech data, COSMIC demonstrates emerging capabilities in instruction-following and in-context learning. Equipped with such capabilities, COSMIC achieves a maximum 33.18 BLEU score in 0-shot EN-to-X speech to text translation (S2TT) and a significant boost in the 1-shot setting. Additionally, there is an average 25.8% relative Word Error Rate (WER) reduction for 1-shot cross-domain adaptation. COSMIC exhibits a significant automatic speech recognition (ASR) accuracy gain in contextual biasing tasks due to its instruction-following capability.
引用
收藏
页码:4164 / 4168
页数:5
相关论文
共 43 条
  • [1] DYNOSAUR: A Dynamic Growth Paradigm for Instruction-Tuning Data Curation
    Yin, Da
    Liu, Xiao
    Yin, Fan
    Zhong, Ming
    Bansal, Hritik
    Han, Jiawei
    Chang, Kai-Wei
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 4031 - 4047
  • [2] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning
    Li, Ming
    Zhang, Yong
    He, Shwai
    Li, Zhitao
    Zhao, Hongyu
    Wang, Jianzong
    Cheng, Ning
    Zhou, Tianyi
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 14255 - 14273
  • [3] Symbol tuning improves in-context learning in language models
    Wei, Jerry
    Hou, Le
    Lampinen, Andrew
    Chen, Xiangning
    Huang, Da
    Tay, Yi
    Chen, Xinyun
    Lu, Yifeng
    Zhou, Denny
    Ma, Tengyu
    Le, Quoc V.
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 968 - 979
  • [4] Assessing In-context Learning and Fine-tuning for Topic Classification of German Web Data
    Scheib, Julian
    Ulloa, Roberto
    Spitz, Andreas
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 4: STUDENT RESEARCH WORKSHOP, 2024, : 162 - 176
  • [5] Ensemble-Instruct: Generating Instruction-Tuning Data with a Heterogeneous Mixture of LMs
    Lee, Young-Suk
    Sultan, Md Arafat
    El-Kurdi, Yousef
    Munawar, Tahira Naseem Asim
    Florian, Radu
    Roukos, Salim
    Astudillo, Ramon Fernandez
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 12561 - 12571
  • [6] Meta-learning via Language Model In-context Tuning
    Chen, Yanda
    Zhong, Ruiqi
    Zha, Sheng
    Karypis, George
    He, He
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 719 - 730
  • [7] Iterative Forward Tuning Boosts In-Context Learning in Language Models
    Yang, Jiaxi
    Hui, Binyuan
    Yang, Min
    Wang, Bailin
    Li, Bowen
    Li, Binhua
    Huang, Fei
    Li, Yongbin
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 15460 - 15473
  • [8] Exploring In-Context Learning of Textless Speech Language Model for Speech Classification Tasks
    Chang, Kai-Wei
    Hsu, Ming-Hao
    Li, Shan-Wen
    Lee, Hung-yi
    INTERSPEECH 2024, 2024, : 4139 - 4143
  • [9] Smaller Language Models are capable of selecting Instruction-Tuning Training Data for Larger Language Models
    Mekala, Dheeraj
    Nguyen, Alex
    Shang, Jingbo
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 10456 - 10470
  • [10] Understanding In-Context Learning via Supportive Pretraining Data
    Han, Xiaochuang
    Simig, Daniel
    Mihaylov, Todor
    Tsvetkov, Yulia
    Celikyilmaz, Asli
    Wang, Tianlu
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 12660 - 12673