IncreasingWeb3D Accessibility with Audio Captioning

被引:0
|
作者
Polys, Nicholas F. [1 ]
Wasi, Sheeban [1 ]
机构
[1] Virginia Tech, Blacksburg, VA 24061 USA
关键词
user study; neural network; YOLO; narration; SONIFICATION; PEOPLE;
D O I
10.1145/3611314.3615902
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Situational awareness plays a critical role in daily life, enabling individuals to comprehend their surroundings, make informed decisions, and navigate safely. However, individuals with low vision or visual impairments face difficulties in perceiving their real or virtual environment. In order to address this challenge, we propose a 3D computer vision-based accessibility solution, empowered by object-detection and text-to-speech technology. Our application describes the visual content of a Web3D scene from the user's perspective through auditory channels, thereby enhancing situational awareness for individuals with visual impairments in virtual and physical environments. We conducted a user study of 44 participants to compare a set of algorithms for specific tasks, such as Search or Summarize, and assessed the effectiveness of our captioning algorithms based on user ratings of naturalness, correctness, and satisfaction. Our study results indicate positive subjective results in accessibility for both normal and visually-impaired subjects and also distinguish significant effects between the task and the captioning algorithm.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] View Selection for 3D Captioning via Diffusion Ranking
    Luo, Tiange
    Johnson, Justin
    Lee, Honglak
    COMPUTER VISION - ECCV 2024, PT XXXI, 2025, 15089 : 180 - 197
  • [42] Video Captioning Based on C3D and Visual Elements
    Xiao H.
    Shi J.
    2018, South China University of Technology (46): : 88 - 95
  • [43] DIVERSITY-CONTROLLABLE AND ACCURATE AUDIO CAPTIONING BASED ON NEURAL CONDITION
    Xu, Xuenan
    Wu, Mengyue
    Yu, Kai
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 971 - 975
  • [44] Singal processing for 3-D audio
    Rumsey, Francis
    AES: Journal of the Audio Engineering Society, 2008, 56 (7-8): : 640 - 645
  • [45] 3D audio as an information environment
    Lennox, PP
    Vaughan, JM
    Myatt, T
    PROCEEDINGS OF THE AES 19TH INTERNATIONAL CONFERENCE SURROUND SOUND: TECHNIQUES, TECHNOLOGY AND PERCEPTION, 2001, : 295 - 306
  • [46] Beyond the Status Quo: A Contemporary Survey of Advances and Challenges in Audio Captioning
    Xu, Xuenan
    Xie, Zeyu
    Wu, Mengyue
    Yu, Kai
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 95 - 112
  • [47] A 3D-Audio Reconfigurable Processor
    Theodoropoulos, Dimitris
    Kuzmanov, Georgi
    Gaydadjiev, Georgi
    FPGA 10, 2010, : 107 - 110
  • [48] Distributed 3D audio rendering
    Low, C
    Babarit, L
    COMPUTER NETWORKS AND ISDN SYSTEMS, 1998, 30 (1-7): : 407 - 415
  • [49] INVESTIGATING LOCAL AND GLOBAL INFORMATION FOR AUTOMATED AUDIO CAPTIONING WITH TRANSFER LEARNING
    Xu, Xuenan
    Dinkel, Heinrich
    Wu, Mengyue
    Xie, Zeyu
    Yu, Kai
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 905 - 909
  • [50] Local Information Assisted Attention-Free Decoder for Audio Captioning
    Xiao, Feiyang
    Guan, Jian
    Lan, Haiyan
    Zhu, Qiaoxi
    Wang, Wenwu
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 1604 - 1608