CARTGPT: Improving CART Captioning using Large Language Models

被引:0
|
作者
Wu, Liang-Yuan [1 ]
Kleiver, Andrea
Jain, Dhruv [1 ]
机构
[1] Univ Michigan, Comp Sci & Engn, Ann Arbor, MI 48109 USA
关键词
Accessibility; Deaf and hard of hearing; real-time captioning;
D O I
10.1145/3663548.3688494
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Communication Access Realtime Translation (CART) is a commonly used real-time captioning technology used by deaf and hard of hearing (DHH) people, due to its accuracy, reliability, and ability to provide a holistic view of the conversational environment (e.g., by displaying speaker names). However, in many real-world situations (e.g., noisy environments, long meetings), the CART captioning accuracy can considerably decline, thereby affecting the comprehension of DHH people. In this work-in-progress paper, we introduce CARTGPT, a system to assist CART captioners in improving their transcription accuracy. CARTGPT takes in errored CART captions and inaccurate automatic speech recognition (ASR) captions as input and uses a large language model to generate corrected captions in real-time. We quantified performance on a noisy speech dataset, showing that our system outperforms both CART (+5.6% accuracy) and a state-of-the-art ASR model (+17.3%). A preliminary evaluation with three DHH users further demonstrates the promise of our approach.
引用
收藏
页数:5
相关论文
共 50 条
  • [11] Improving Patient Engagement: Is There a Role for Large Language Models?
    Kouzy, Ramez
    Bitterman, Danielle S.
    INTERNATIONAL JOURNAL OF RADIATION ONCOLOGY BIOLOGY PHYSICS, 2024, 120 (03): : 639 - 641
  • [12] Improving Machine Translation Formality with Large Language Models
    Yang, Murun
    Li, Fuxue
    CMC-COMPUTERS MATERIALS & CONTINUA, 2025, 82 (02): : 2061 - 2075
  • [13] Knowledge Enhancement and Optimization Strategies for Remote Sensing Image Captioning Using Contrastive Language Image Pre-training and Large Language Models
    Wang, Xinren
    Wan, Tengfei
    Song, Jianning
    Huang, Jingmeng
    PROCEEDINGS OF 2024 INTERNATIONAL CONFERENCE ON MACHINE INTELLIGENCE AND DIGITAL APPLICATIONS, MIDA2024, 2024, : 313 - 318
  • [14] Using large language models in psychology
    Demszky, Dorottya
    Yang, Diyi
    Yeager, David
    Bryan, Christopher
    Clapper, Margarett
    Chandhok, Susannah
    Eichstaedt, Johannes
    Hecht, Cameron
    Jamieson, Jeremy
    Johnson, Meghann
    Jones, Michaela
    Krettek-Cobb, Danielle
    Lai, Leslie
    Jonesmitchell, Nirel
    Ong, Desmond
    Dweck, Carol
    Gross, James
    Pennebaker, James
    NATURE REVIEWS PSYCHOLOGY, 2023, 2 (11): : 688 - 701
  • [15] Using large language models in psychology
    Dorottya Demszky
    Diyi Yang
    David S. Yeager
    Christopher J. Bryan
    Margarett Clapper
    Susannah Chandhok
    Johannes C. Eichstaedt
    Cameron Hecht
    Jeremy Jamieson
    Meghann Johnson
    Michaela Jones
    Danielle Krettek-Cobb
    Leslie Lai
    Nirel JonesMitchell
    Desmond C. Ong
    Carol S. Dweck
    James J. Gross
    James W. Pennebaker
    Nature Reviews Psychology, 2023, 2 : 688 - 701
  • [16] Using large language models wisely
    不详
    NATURE ASTRONOMY, 2025, 9 (03): : 315 - 315
  • [17] Improving language models by using distant information
    Brun, A.
    Langlois, D.
    Smaili, K.
    2007 9TH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND ITS APPLICATIONS, VOLS 1-3, 2007, : 823 - +
  • [18] Improving Large-scale Language Models and Resources for Filipino
    Cruz, Jan Christian Blaise
    Cheng, Charibeth
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6548 - 6555
  • [19] SECap: Speech Emotion Captioning with Large Language Model
    Xu, Yaoxun
    Chen, Hangting
    Yu, Jianwei
    Huang, Qiaochu
    Wu, Zhiyong
    Zhang, Shi-Xiong
    Li, Guangzhi
    Luo, Yi
    Gu, Rongzhi
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 17, 2024, : 19323 - 19331
  • [20] Improving Causal Inference of Large Language Models with SCM Tools
    Hua, Zhenyang
    Xing, Shuyue
    Jiang, Huixing
    Wei, Chen
    Wang, Xiaojie
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT III, NLPCC 2024, 2025, 15361 : 3 - 14