CARTGPT: Improving CART Captioning using Large Language Models

被引:0
|
作者
Wu, Liang-Yuan [1 ]
Kleiver, Andrea
Jain, Dhruv [1 ]
机构
[1] Univ Michigan, Comp Sci & Engn, Ann Arbor, MI 48109 USA
关键词
Accessibility; Deaf and hard of hearing; real-time captioning;
D O I
10.1145/3663548.3688494
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Communication Access Realtime Translation (CART) is a commonly used real-time captioning technology used by deaf and hard of hearing (DHH) people, due to its accuracy, reliability, and ability to provide a holistic view of the conversational environment (e.g., by displaying speaker names). However, in many real-world situations (e.g., noisy environments, long meetings), the CART captioning accuracy can considerably decline, thereby affecting the comprehension of DHH people. In this work-in-progress paper, we introduce CARTGPT, a system to assist CART captioners in improving their transcription accuracy. CARTGPT takes in errored CART captions and inaccurate automatic speech recognition (ASR) captions as input and uses a large language model to generate corrected captions in real-time. We quantified performance on a noisy speech dataset, showing that our system outperforms both CART (+5.6% accuracy) and a state-of-the-art ASR model (+17.3%). A preliminary evaluation with three DHH users further demonstrates the promise of our approach.
引用
收藏
页数:5
相关论文
共 50 条
  • [21] Improving Large Language Models in Event Relation Logical Prediction
    Chen, Meiqi
    Ma, Yubo
    Song, Kaitao
    Cao, Yixin
    Zhang, Yan
    Li, Dongsheng
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 9451 - 9478
  • [22] Measuring and Improving the Energy Efficiency of Large Language Models Inference
    Argerich, Mauricio Fadel
    Patino-Martinez, Marta
    IEEE ACCESS, 2024, 12 : 80194 - 80207
  • [23] Improving generalization in large language models by learning prefix subspaces
    Falissard, Louis
    Guigue, Vincent
    Soulier, Laure
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 11474 - 11483
  • [24] Language Models for Image Captioning: The Quirks and What Works
    Devlin, Jacob
    Cheng, Hao
    Fang, Hao
    Gupta, Saurabh
    Deng, Li
    He, Xiaodong
    Zweig, Geoffrey
    Mitchell, Margaret
    PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL) AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (IJCNLP), VOL 2, 2015, : 100 - 105
  • [25] Using Avatars for Improving Speaker Identification in Captioning
    Vy, Quoc V.
    Fels, Deborah I.
    HUMAN-COMPUTER INTERACTION - INTERACT 2009, PT II, PROCEEDINGS, 2009, 5727 : 916 - 919
  • [26] Using Large Language Models in Business Processes
    Grisold, Thomas
    vom Brocke, Jan
    Kratsch, Wolfgang
    Mendling, Jan
    Vidgof, Maxim
    BUSINESS PROCESS MANAGEMENT, BPM 2023, 2023, 14159 : XXIX - XXXI
  • [27] Accelerating Pharmacovigilance using Large Language Models
    Prakash, Mukkamala Venkata Sai
    Parab, Ganesh
    Veeramalla, Meghana
    Reddy, Siddartha
    Varun, V.
    Gopalakrishnan, Saisubramaniam
    Pagidipally, Vishal
    Vaddina, Vishal
    PROCEEDINGS OF THE 17TH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, WSDM 2024, 2024, : 1182 - 1183
  • [28] Improving Audio Explanations Using Audio Language Models
    Akman, Alican
    Sun, Qiyang
    Schuller, Bjorn W.
    IEEE SIGNAL PROCESSING LETTERS, 2025, 32 : 741 - 745
  • [29] Improving Japanese Language Models Using POS Information
    Chen, Langzhou
    Nagae, Hisayoshi
    Stuttle, Matt
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 821 - +
  • [30] On Improving Repository-Level Code QA for Large Language Models
    Strich, Jan
    Schneider, Florian
    Nikishina, IrMa
    Biemann, Chris
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 4: STUDENT RESEARCH WORKSHOP, 2024, : 227 - 262