CARTGPT: Improving CART Captioning using Large Language Models

被引：0

作者：

Wu, Liang-Yuan ^{[1
]}

Kleiver, Andrea

Jain, Dhruv ^{[1
]}

机构：

[1] Univ Michigan, Comp Sci & Engn, Ann Arbor, MI 48109 USA

来源：

PROCEEDINGS OF THE 26TH INTERNATIONAL ACM SIGACCESS CONFERENCE ON COMPUTERS AND ACCESSIBILITY, ASSETS 2024 | 2024年

关键词：

Accessibility; Deaf and hard of hearing; real-time captioning;

D O I：

10.1145/3663548.3688494

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Communication Access Realtime Translation (CART) is a commonly used real-time captioning technology used by deaf and hard of hearing (DHH) people, due to its accuracy, reliability, and ability to provide a holistic view of the conversational environment (e.g., by displaying speaker names). However, in many real-world situations (e.g., noisy environments, long meetings), the CART captioning accuracy can considerably decline, thereby affecting the comprehension of DHH people. In this work-in-progress paper, we introduce CARTGPT, a system to assist CART captioners in improving their transcription accuracy. CARTGPT takes in errored CART captions and inaccurate automatic speech recognition (ASR) captions as input and uses a large language model to generate corrected captions in real-time. We quantified performance on a noisy speech dataset, showing that our system outperforms both CART (+5.6% accuracy) and a state-of-the-art ASR model (+17.3%). A preliminary evaluation with three DHH users further demonstrates the promise of our approach.

引用

页数：5

共 50 条

[11] Improving Patient Engagement: Is There a Role for Large Language Models?
Kouzy, Ramez
Bitterman, Danielle S.
INTERNATIONAL JOURNAL OF RADIATION ONCOLOGY BIOLOGY PHYSICS, 2024, 120 (03): : 639 - 641
[12] Improving Machine Translation Formality with Large Language Models
Yang, Murun
Li, Fuxue
CMC-COMPUTERS MATERIALS & CONTINUA, 2025, 82 (02): : 2061 - 2075
[13] Knowledge Enhancement and Optimization Strategies for Remote Sensing Image Captioning Using Contrastive Language Image Pre-training and Large Language Models
Wang, Xinren
Wan, Tengfei
Song, Jianning
Huang, Jingmeng
PROCEEDINGS OF 2024 INTERNATIONAL CONFERENCE ON MACHINE INTELLIGENCE AND DIGITAL APPLICATIONS, MIDA2024, 2024, : 313 - 318
[14] Using large language models in psychology
Demszky, Dorottya
Yang, Diyi
Yeager, David
Bryan, Christopher
Clapper, Margarett
Chandhok, Susannah
Eichstaedt, Johannes
Hecht, Cameron
Jamieson, Jeremy
Johnson, Meghann
Jones, Michaela
Krettek-Cobb, Danielle
Lai, Leslie
Jonesmitchell, Nirel
Ong, Desmond
Dweck, Carol
Gross, James
Pennebaker, James
NATURE REVIEWS PSYCHOLOGY, 2023, 2 (11): : 688 - 701
[15] Using large language models in psychology
Dorottya Demszky
Diyi Yang
David S. Yeager
Christopher J. Bryan
Margarett Clapper
Susannah Chandhok
Johannes C. Eichstaedt
Cameron Hecht
Jeremy Jamieson
Meghann Johnson
Michaela Jones
Danielle Krettek-Cobb
Leslie Lai
Nirel JonesMitchell
Desmond C. Ong
Carol S. Dweck
James J. Gross
James W. Pennebaker
Nature Reviews Psychology, 2023, 2 : 688 - 701
[16] Using large language models wisely
不详
NATURE ASTRONOMY, 2025, 9 (03): : 315 - 315
[17] Improving language models by using distant information
Brun, A.
Langlois, D.
Smaili, K.
2007 9TH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND ITS APPLICATIONS, VOLS 1-3, 2007, : 823 - +
[18] Improving Large-scale Language Models and Resources for Filipino
Cruz, Jan Christian Blaise
Cheng, Charibeth
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6548 - 6555
[19] SECap: Speech Emotion Captioning with Large Language Model
Xu, Yaoxun
Chen, Hangting
Yu, Jianwei
Huang, Qiaochu
Wu, Zhiyong
Zhang, Shi-Xiong
Li, Guangzhi
Luo, Yi
Gu, Rongzhi
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 17, 2024, : 19323 - 19331
[20] Improving Causal Inference of Large Language Models with SCM Tools
Hua, Zhenyang
Xing, Shuyue
Jiang, Huixing
Wei, Chen
Wang, Xiaojie
NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT III, NLPCC 2024, 2025, 15361 : 3 - 14

← 1 2 3 4 5 →