COSMIC: Data Efficient Instruction-tuning For Speech In-Context Learning

被引：0

作者：

Pan, Jing ^{[1
]}

Wu, Jian ^{[1
]}

Gaur, Yashesh ^{[1
]}

Sivasankaran, Sunit ^{[1
]}

Chen, Zhuo ^{[1
]}

Liu, Shujie ^{[1
]}

Li, Jinyu ^{[1
]}

机构：

[1] Microsoft, One Microsoft Way, Redmond, WA 98052 USA

来源：

INTERSPEECH 2024 | 2024年

关键词：

multi modality; large language model; speechin-context learning; instruction tuning;

D O I：

10.21437/Interspeech.2024-1346

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present a cost-effective method to integrate speech into a large language model (LLM), resulting in a Contextual Speech Model with Instruction-following/in-context-learning Capabilities (COSMIC) multi-modal LLM. Using GPT-3.5, we generate Speech Comprehension Test Question-Answer (SQA) pairs from speech transcriptions for supervised instruction tuning. With under 30 million trainable parameters and only 450 hours of English speech data, COSMIC demonstrates emerging capabilities in instruction-following and in-context learning. Equipped with such capabilities, COSMIC achieves a maximum 33.18 BLEU score in 0-shot EN-to-X speech to text translation (S2TT) and a significant boost in the 1-shot setting. Additionally, there is an average 25.8% relative Word Error Rate (WER) reduction for 1-shot cross-domain adaptation. COSMIC exhibits a significant automatic speech recognition (ASR) accuracy gain in contextual biasing tasks due to its instruction-following capability.

引用

页码：4164 / 4168

页数：5

共 43 条

[41] CEFL: Online Admission Control, Data Scheduling, and Accuracy Tuning for Cost-Efficient Federated Learning Across Edge Nodes
Zhou, Zhi
Yang, Song
Pu, Lingjun
Yu, Shuai
IEEE INTERNET OF THINGS JOURNAL, 2020, 7 (10) : 9341 - 9356
[42] Inclusive ASR for Disfluent Speech: Cascaded Large-Scale Self-Supervised Learning with Targeted Fine-Tuning and Data Augmentation
Mujtaba, Dena
Mahapatra, Nihar R.
Arne, Megan
Yaruss, J. Scott
Herring, Caryn
Bin, Jia
INTERSPEECH 2024, 2024, : 1275 - 1279
[43] Towards efficient machine-learning-based reduction of the cosmic-ray induced background in X-ray imaging detectors: increasing context awareness
Poliszczuk, Artem
Wilkins, Dan
Allen, Steven W.
Miller, Eric D.
Chattopadhyay, Tanmoy
Schneider, Benjamin
Darve, Julien Eric
Bautz, Marshall
Falcone, Abe
Foster, Richard
Grant, Catherine E.
Herrmann, Sven
Kraft, Ralph
Morris, R. Glenn
Nulsen, Paul
Orel, Peter
Schellenberger, Gerrit
Stueber, Haley R.
SPACE TELESCOPES AND INSTRUMENTATION 2024: ULTRAVIOLET TO GAMMA RAY, PT 1, 2024, 13093

← 1 2 3 4 5 →