IncreasingWeb3D Accessibility with Audio Captioning

被引:0
|
作者
Polys, Nicholas F. [1 ]
Wasi, Sheeban [1 ]
机构
[1] Virginia Tech, Blacksburg, VA 24061 USA
关键词
user study; neural network; YOLO; narration; SONIFICATION; PEOPLE;
D O I
10.1145/3611314.3615902
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Situational awareness plays a critical role in daily life, enabling individuals to comprehend their surroundings, make informed decisions, and navigate safely. However, individuals with low vision or visual impairments face difficulties in perceiving their real or virtual environment. In order to address this challenge, we propose a 3D computer vision-based accessibility solution, empowered by object-detection and text-to-speech technology. Our application describes the visual content of a Web3D scene from the user's perspective through auditory channels, thereby enhancing situational awareness for individuals with visual impairments in virtual and physical environments. We conducted a user study of 44 participants to compare a set of algorithms for specific tasks, such as Search or Summarize, and assessed the effectiveness of our captioning algorithms based on user ratings of naturalness, correctness, and satisfaction. Our study results indicate positive subjective results in accessibility for both normal and visually-impaired subjects and also distinguish significant effects between the task and the captioning algorithm.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] Using various pre-trained models for audio feature extraction in automated audio captioning
    Won, Hyejin
    Kim, Baekseung
    Kwak, Il-Youp
    Lim, Changwon
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 231
  • [32] Automated audio captioning: an overview of recent progress and new challenges
    Mei, Xinhao
    Liu, Xubo
    Plumbley, Mark D.
    Wang, Wenwu
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2022, 2022 (01)
  • [33] Rethinking Transfer and Auxiliary Learning for Improving Audio Captioning Transformer
    Shin, Wooseok
    Park, Hyun Joon
    Kim, Jin Sob
    Kim, Dongwon
    Lee, Seungjin
    Han, Sung Won
    INTERSPEECH 2023, 2023, : 2128 - 2132
  • [34] Automated Audio Captioning with Epochal Difficult Captions for curriculum learning
    Koh, Andrew
    Tiwari, Soham
    Siong, Chng Eng
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 1058 - 1063
  • [35] Audio interface for immersive 3D-audio desktop applications
    Sontacchi, A
    Strauss, M
    Höldrich, R
    VECIMS'03: 2003 IEEE INTERNATIONAL SYMPOSIUM ON VIRTUAL ENVIRONMENTS, HUMAN-COMPUTER INTERFACES AND MEASUREMENT SYSTEMS, 2003, : 179 - 182
  • [36] UniT3D: A Unified Transformer for 3D Dense Captioning and Visual Grounding
    Chen, Dave Zhenyu
    Hu, Ronghang
    Chen, Xinlei
    Niessner, Matthias
    Chang, Angel X.
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 18063 - 18073
  • [37] 3D Visual Grounding-Audio: 3D scene object detection based on audio
    Zhang, Can
    Cai, Zeyu
    Chen, Xunhao
    Da, Feipeng
    Gai, Shaoyan
    NEUROCOMPUTING, 2025, 611
  • [38] Evaluating Web Audio for Learning, Accessibility, and Distribution
    Lindetorp H.
    Falkenberg K.
    AES: Journal of the Audio Engineering Society, 2022, 70 (11): : 951 - 961
  • [39] Evaluating Web Audio for Learning, Accessibility, and Distribution
    Lindetorp, Hans
    Falkenberg, Kjetil
    JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 2022, 70 (11): : 951 - 961
  • [40] Explore and Tell: Embodied Visual Captioning in 3D Environments
    Hu, Anwen
    Chen, Shizhe
    Zhang, Liang
    Jin, Qin
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 2482 - 2491