IncreasingWeb3D Accessibility with Audio Captioning

被引:0
|
作者
Polys, Nicholas F. [1 ]
Wasi, Sheeban [1 ]
机构
[1] Virginia Tech, Blacksburg, VA 24061 USA
关键词
user study; neural network; YOLO; narration; SONIFICATION; PEOPLE;
D O I
10.1145/3611314.3615902
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Situational awareness plays a critical role in daily life, enabling individuals to comprehend their surroundings, make informed decisions, and navigate safely. However, individuals with low vision or visual impairments face difficulties in perceiving their real or virtual environment. In order to address this challenge, we propose a 3D computer vision-based accessibility solution, empowered by object-detection and text-to-speech technology. Our application describes the visual content of a Web3D scene from the user's perspective through auditory channels, thereby enhancing situational awareness for individuals with visual impairments in virtual and physical environments. We conducted a user study of 44 participants to compare a set of algorithms for specific tasks, such as Search or Summarize, and assessed the effectiveness of our captioning algorithms based on user ratings of naturalness, correctness, and satisfaction. Our study results indicate positive subjective results in accessibility for both normal and visually-impaired subjects and also distinguish significant effects between the task and the captioning algorithm.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] AUDIO DIFFERENCE LEARNING FOR AUDIO CAPTIONING
    Komatsu, Tatsuya
    Fujita, Yusuke
    Takeda, Kazuya
    Toda, Tomoki
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 1456 - 1460
  • [2] TRAINING AUDIO CAPTIONING MODELS WITHOUT AUDIO
    Deshmukh, Soham
    Elizalde, Benjamin
    Emmanouilidou, Dimitra
    Raj, Bhiksha
    Singh, Rita
    Wang, Huaming
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 371 - 375
  • [3] CLOTHO: AN AUDIO CAPTIONING DATASET
    Drossos, Konstantinos
    Lipping, Samuel
    Virtanen, Tuomas
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 736 - 740
  • [4] Audio Captioning Based on Combined Audio and Semantic Embeddings
    Eren, Aysegul Ozkaya
    Sert, Mustafa
    2020 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM 2020), 2020, : 41 - 48
  • [5] Graph Attention for Automated Audio Captioning
    Xiao, Feiyang
    Guan, Jian
    Zhu, Qiaoxi
    Wang, Wenwu
    IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 413 - 417
  • [6] Automated Audio Captioning With Topic Modeling
    Eren, Aysegul Ozkaya
    Sert, Mustafa
    IEEE ACCESS, 2023, 11 : 4983 - 4991
  • [7] Joint speech recognition and audio captioning
    Carnegie Mellon University, United States
    不详
    arXiv, 1600,
  • [8] JOINT SPEECH RECOGNITION AND AUDIO CAPTIONING
    Narisetty, Chaitanya
    Tsunoo, Emiru
    Chang, Xuankai
    Kashiwagi, Yosuke
    Hentschel, Michael
    Watanabe, Shinji
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7892 - 7896
  • [9] DIVERSE AUDIO CAPTIONING VIA ADVERSARIAL TRAINING
    Mei, Xinhao
    Liu, Xubo
    Sun, Jianyuan
    Plumbley, Mark D.
    Wang, Wenwu
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8882 - 8886
  • [10] RECAP: RETRIEVAL-AUGMENTED AUDIO CAPTIONING
    Ghosh, Sreyan
    Kumar, Sonal
    Evuru, Chandra Kiran Reddy
    Duraiswami, Ramani
    Manocha, Dinesh
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 1161 - 1165