CARTGPT: Improving CART Captioning using Large Language Models

被引:0
|
作者
Wu, Liang-Yuan [1 ]
Kleiver, Andrea
Jain, Dhruv [1 ]
机构
[1] Univ Michigan, Comp Sci & Engn, Ann Arbor, MI 48109 USA
关键词
Accessibility; Deaf and hard of hearing; real-time captioning;
D O I
10.1145/3663548.3688494
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Communication Access Realtime Translation (CART) is a commonly used real-time captioning technology used by deaf and hard of hearing (DHH) people, due to its accuracy, reliability, and ability to provide a holistic view of the conversational environment (e.g., by displaying speaker names). However, in many real-world situations (e.g., noisy environments, long meetings), the CART captioning accuracy can considerably decline, thereby affecting the comprehension of DHH people. In this work-in-progress paper, we introduce CARTGPT, a system to assist CART captioners in improving their transcription accuracy. CARTGPT takes in errored CART captions and inaccurate automatic speech recognition (ASR) captions as input and uses a large language model to generate corrected captions in real-time. We quantified performance on a noisy speech dataset, showing that our system outperforms both CART (+5.6% accuracy) and a state-of-the-art ASR model (+17.3%). A preliminary evaluation with three DHH users further demonstrates the promise of our approach.
引用
收藏
页数:5
相关论文
共 50 条
  • [31] Improving requirements completeness: automated assistance through large language models
    Dipeeka Luitel
    Shabnam Hassani
    Mehrdad Sabetzadeh
    Requirements Engineering, 2024, 29 : 73 - 95
  • [32] Improving mathematics assessment readability: Do large language models help?
    Patel, Nirmal
    Nagpal, Pooja
    Shah, Tirth
    Sharma, Aditya
    Malvi, Shrey
    Lomas, Derek
    JOURNAL OF COMPUTER ASSISTED LEARNING, 2023, 39 (03) : 804 - 822
  • [33] Improving requirements completeness: automated assistance through large language models
    Luitel, Dipeeka
    Hassani, Shabnam
    Sabetzadeh, Mehrdad
    REQUIREMENTS ENGINEERING, 2024, 29 (01) : 73 - 95
  • [34] Comuniqa : Exploring Large Language Models for Improving English Speaking Skills
    Mhasakar, Manas
    Sharma, Shikhar
    Mehra, Apurv
    Venaik, Utkarsh
    Singhal, Ujjwal
    Kumar, Dhruv
    Mittal, Kashish
    PROCEEDINGS OF THE ACM SIGCAS/SIGCHI CONFERENCE ON COMPUTING AND SUSTAINABLE SOCIETIES 2024, COMPASS 2024, 2024, : 256 - 267
  • [35] The Emerging Role of Large Language Models in Improving Prostate Cancer Literacy
    Geanta, Marius
    Badescu, Daniel
    Chirca, Narcis
    Nechita, Ovidiu Catalin
    Radu, Cosmin George
    Rascu, Stefan
    Radavoi, Daniel
    Sima, Cristian
    Toma, Cristian
    Jinga, Viorel
    BIOENGINEERING-BASEL, 2024, 11 (07):
  • [36] Towards Benchmarking and Improving the Temporal Reasoning Capability of Large Language Models
    Tan, Qingyu
    Ng, Hwee Tou
    Bing, Lidong
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 14820 - 14835
  • [37] Improving the Performance of Image Captioning Models Trained on Small Datasets
    du Plessis, Mikkel
    Brink, Willie
    ARTIFICIAL INTELLIGENCE RESEARCH, SACAIR 2021, 2022, 1551 : 77 - 91
  • [38] Improving VR Accessibility Through Automatic 360 Scene Description Using Multimodal Large Language Models
    Masasi de Oliveira, Elisa Ayumi
    Costa Silva, Diogo Fernandes
    Galvao Filho, Arlindo Rodrigues
    PROCEEDINGS OF 26TH SYMPOSIUM ON VIRTUAL AND AUGMENTED REALITY, SVR 2024, 2024, : 289 - 293
  • [39] Improving Reinforcement Learning Based Image Captioning with Natural Language Prior
    Guo, Tszhang
    Chang, Shiyu
    Yu, Mo
    Bai, Kun
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 751 - 756
  • [40] Shotluck Holmes: A Family of Efficient Small-Scale Large Language Vision Models for Video Captioning and Summarization
    Luo, Richard
    Peng, Austin
    Vasudev, Adithya
    Jain, Rishabh
    PROCEEDINGS OF THE 2ND INTERNATIONAL WORKSHOP ON DEEP MULTIMODAL GENERATION AND RETRIEVAL, MMGR 2024, 2024, : 7 - 11