IncreasingWeb3D Accessibility with Audio Captioning

被引:0
|
作者
Polys, Nicholas F. [1 ]
Wasi, Sheeban [1 ]
机构
[1] Virginia Tech, Blacksburg, VA 24061 USA
关键词
user study; neural network; YOLO; narration; SONIFICATION; PEOPLE;
D O I
10.1145/3611314.3615902
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Situational awareness plays a critical role in daily life, enabling individuals to comprehend their surroundings, make informed decisions, and navigate safely. However, individuals with low vision or visual impairments face difficulties in perceiving their real or virtual environment. In order to address this challenge, we propose a 3D computer vision-based accessibility solution, empowered by object-detection and text-to-speech technology. Our application describes the visual content of a Web3D scene from the user's perspective through auditory channels, thereby enhancing situational awareness for individuals with visual impairments in virtual and physical environments. We conducted a user study of 44 participants to compare a set of algorithms for specific tasks, such as Search or Summarize, and assessed the effectiveness of our captioning algorithms based on user ratings of naturalness, correctness, and satisfaction. Our study results indicate positive subjective results in accessibility for both normal and visually-impaired subjects and also distinguish significant effects between the task and the captioning algorithm.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] Scalable 3D Captioning with Pretrained Models
    Luo, Tiange
    Rockwell, Chris
    Lee, Honglak
    Johnson, Justin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [22] SEEING AND HEARING TOO: AUDIO REPRESENTATION FOR VIDEO CAPTIONING
    Chuang, Shun-Po
    Wan, Chia-Hung
    Huang, Pang-Chi
    Yang, Chi-Yu
    Lee, Hung-Yi
    2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 381 - 388
  • [23] FeatureCut: An Adaptive Data Augmentation for Automated Audio Captioning
    Ye, Zhongjie
    Wang, Yuqing
    Wang, Helin
    Yang, Dongchao
    Zou, Yuexian
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 313 - 318
  • [24] Enhancing Automated Audio Captioning via Large Language Models with Optimized Audio Encoding
    Liu, Jizhong
    Li, Gang
    Zhang, Junbo
    Dinkel, Heinrich
    Wang, Yongqing
    Yan, Zhiyong
    Wang, Yujun
    Bin Wang
    INTERSPEECH 2024, 2024, : 1135 - 1139
  • [25] Captioning and Indian Sign Language as Accessibility Tools in Universal Design
    Poothullil, John Mathew Martin
    Sahasrabudhe, Sujit
    Chavan, Prashant D.
    Toppo, Deepak
    SAGE OPEN, 2013, 3 (02): : 1 - 16
  • [26] Closed Captioning for Accessibility of Hard of Hearing People in Educational Environments
    Revuelta Sanz, Pablo
    Sanchez Pena, Jose Manuel
    Jimenez Dorado, Javier
    Mezcua, Belen Ruiz
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2008, (41): : 305 - 306
  • [27] A Transformer-based Audio Captioning Model with Keyword Estimation
    Koizumi, Yuma
    Masumura, Ryo
    Nishida, Kyosuke
    Yasuda, Masahiro
    Saito, Shoichiro
    INTERSPEECH 2020, 2020, : 1977 - 1981
  • [28] Automated audio captioning: an overview of recent progress and new challenges
    Xinhao Mei
    Xubo Liu
    Mark D. Plumbley
    Wenwu Wang
    EURASIP Journal on Audio, Speech, and Music Processing, 2022
  • [29] Scene Graph with 3D Information for Change Captioning
    Liao, Zeming
    Huang, Qingbao
    Liang, Yu
    Fu, Mingyi
    Cai, Yi
    Li, Qing
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 5074 - 5082
  • [30] Enhance Temporal Relations in Audio Captioning with Sound Event Detection
    Xie, Zeyu
    Xu, Xuenan
    Wu, Mengyue
    Yu, Kai
    INTERSPEECH 2023, 2023, : 4179 - 4183