CARTGPT: Improving CART Captioning using Large Language Models

被引:0
|
作者
Wu, Liang-Yuan [1 ]
Kleiver, Andrea
Jain, Dhruv [1 ]
机构
[1] Univ Michigan, Comp Sci & Engn, Ann Arbor, MI 48109 USA
关键词
Accessibility; Deaf and hard of hearing; real-time captioning;
D O I
10.1145/3663548.3688494
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Communication Access Realtime Translation (CART) is a commonly used real-time captioning technology used by deaf and hard of hearing (DHH) people, due to its accuracy, reliability, and ability to provide a holistic view of the conversational environment (e.g., by displaying speaker names). However, in many real-world situations (e.g., noisy environments, long meetings), the CART captioning accuracy can considerably decline, thereby affecting the comprehension of DHH people. In this work-in-progress paper, we introduce CARTGPT, a system to assist CART captioners in improving their transcription accuracy. CARTGPT takes in errored CART captions and inaccurate automatic speech recognition (ASR) captions as input and uses a large language model to generate corrected captions in real-time. We quantified performance on a noisy speech dataset, showing that our system outperforms both CART (+5.6% accuracy) and a state-of-the-art ASR model (+17.3%). A preliminary evaluation with three DHH users further demonstrates the promise of our approach.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] Advanced Image Captioning Using Object Detectors and Large Language Models
    undefined Nikita Andriyanov
    undefined Vitaly Dementiev
    Pattern Recognition and Image Analysis, 2024, 34 (4) : 909 - 912
  • [2] DanceCaps: Pseudo-Captioning for Dance Videos Using Large Language Models
    Kim, Seohyun
    Lee, Kyogu
    APPLIED SCIENCES-BASEL, 2024, 14 (22):
  • [3] Improving Automatic VQA Evaluation Using Large Language Models
    Manas, Oscar
    Krojer, Benno
    Agrawal, Aishwarya
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 5, 2024, : 4171 - 4179
  • [4] Grounding Conversational Robots on Vision Through Dense Captioning and Large Language Models
    Grassi, Lucrezia
    Hong, Zhouyang
    Recchiuto, Carmine Tommaso
    Sgorbissa, Antonio
    2024 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2024, 2024, : 5492 - 5498
  • [5] Improving Recommender Systems with Large Language Models
    Lubos, Sebastian
    ADJUNCT PROCEEDINGS OF THE 32ND ACM CONFERENCE ON USER MODELING, ADAPTATION AND PERSONALIZATION, UMAP 2024, 2024, : 40 - 44
  • [6] Improving Text Embeddings with Large Language Models
    Wang, Liang
    Yang, Nan
    Huang, Xiaolong
    Yang, Linjun
    Majumder, Rangan
    Wei, Furu
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 11897 - 11916
  • [7] Improving Image Captioning with Language Modeling Regularizations
    Ulusoy, Okan
    Akgul, Ceyhun Burak
    Anarim, Emin
    2019 INNOVATIONS IN INTELLIGENT SYSTEMS AND APPLICATIONS CONFERENCE (ASYU), 2019, : 407 - 412
  • [8] Multimodal Emotion Captioning Using Large Language Model with Prompt Engineering
    Xu, Yaoxun
    Zhou, Yixuan
    Cai, Yunrui
    Xie, Jingran
    Ye, Runchuan
    Wu, Zhiyong
    PROCEEDINGS OF THE 2ND INTERNATIONAL WORKSHOP ON MULTIMODAL AND RESPONSIBLE AFFECTIVE COMPUTING, MRAC 2024, 2024, : 104 - 109
  • [9] Improving drug repositioning with negative data labeling using large language models
    Picard, Milan
    Leclercq, Mickael
    Bodein, Antoine
    Scott-Boyer, Marie Pier
    Perin, Olivier
    Droit, Arnaud
    JOURNAL OF CHEMINFORMATICS, 2025, 17 (01):
  • [10] Enhancing Automated Audio Captioning via Large Language Models with Optimized Audio Encoding
    Liu, Jizhong
    Li, Gang
    Zhang, Junbo
    Dinkel, Heinrich
    Wang, Yongqing
    Yan, Zhiyong
    Wang, Yujun
    Bin Wang
    INTERSPEECH 2024, 2024, : 1135 - 1139