IncreasingWeb3D Accessibility with Audio Captioning

被引：0

作者：

Polys, Nicholas F. ^{[1
]}

Wasi, Sheeban ^{[1
]}

机构：

[1] Virginia Tech, Blacksburg, VA 24061 USA

来源：

28TH INTERNATIONAL CONFERENCE ON WEB3D TECHNOLOGY, WEB3D 2023 | 2023年

关键词：

user study; neural network; YOLO; narration; SONIFICATION; PEOPLE;

D O I：

10.1145/3611314.3615902

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Situational awareness plays a critical role in daily life, enabling individuals to comprehend their surroundings, make informed decisions, and navigate safely. However, individuals with low vision or visual impairments face difficulties in perceiving their real or virtual environment. In order to address this challenge, we propose a 3D computer vision-based accessibility solution, empowered by object-detection and text-to-speech technology. Our application describes the visual content of a Web3D scene from the user's perspective through auditory channels, thereby enhancing situational awareness for individuals with visual impairments in virtual and physical environments. We conducted a user study of 44 participants to compare a set of algorithms for specific tasks, such as Search or Summarize, and assessed the effectiveness of our captioning algorithms based on user ratings of naturalness, correctness, and satisfaction. Our study results indicate positive subjective results in accessibility for both normal and visually-impaired subjects and also distinguish significant effects between the task and the captioning algorithm.

引用

页数：10

共 50 条

[31] Using various pre-trained models for audio feature extraction in automated audio captioning
Won, Hyejin
Kim, Baekseung
Kwak, Il-Youp
Lim, Changwon
EXPERT SYSTEMS WITH APPLICATIONS, 2023, 231
[32] Automated audio captioning: an overview of recent progress and new challenges
Mei, Xinhao
Liu, Xubo
Plumbley, Mark D.
Wang, Wenwu
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2022, 2022 (01)
[33] Rethinking Transfer and Auxiliary Learning for Improving Audio Captioning Transformer
Shin, Wooseok
Park, Hyun Joon
Kim, Jin Sob
Kim, Dongwon
Lee, Seungjin
Han, Sung Won
INTERSPEECH 2023, 2023, : 2128 - 2132
[34] Automated Audio Captioning with Epochal Difficult Captions for curriculum learning
Koh, Andrew
Tiwari, Soham
Siong, Chng Eng
PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 1058 - 1063
[35] Audio interface for immersive 3D-audio desktop applications
Sontacchi, A
Strauss, M
Höldrich, R
VECIMS'03: 2003 IEEE INTERNATIONAL SYMPOSIUM ON VIRTUAL ENVIRONMENTS, HUMAN-COMPUTER INTERFACES AND MEASUREMENT SYSTEMS, 2003, : 179 - 182
[36] UniT3D: A Unified Transformer for 3D Dense Captioning and Visual Grounding
Chen, Dave Zhenyu
Hu, Ronghang
Chen, Xinlei
Niessner, Matthias
Chang, Angel X.
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 18063 - 18073
[37] 3D Visual Grounding-Audio: 3D scene object detection based on audio
Zhang, Can
Cai, Zeyu
Chen, Xunhao
Da, Feipeng
Gai, Shaoyan
NEUROCOMPUTING, 2025, 611
[38] Evaluating Web Audio for Learning, Accessibility, and Distribution
Lindetorp H.
Falkenberg K.
AES: Journal of the Audio Engineering Society, 2022, 70 (11): : 951 - 961
[39] Evaluating Web Audio for Learning, Accessibility, and Distribution
Lindetorp, Hans
Falkenberg, Kjetil
JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 2022, 70 (11): : 951 - 961
[40] Explore and Tell: Embodied Visual Captioning in 3D Environments
Hu, Anwen
Chen, Shizhe
Zhang, Liang
Jin, Qin
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 2482 - 2491

← 1 2 3 4 5 →