IncreasingWeb3D Accessibility with Audio Captioning

被引：0

作者：

Polys, Nicholas F. ^{[1
]}

Wasi, Sheeban ^{[1
]}

机构：

[1] Virginia Tech, Blacksburg, VA 24061 USA

来源：

28TH INTERNATIONAL CONFERENCE ON WEB3D TECHNOLOGY, WEB3D 2023 | 2023年

关键词：

user study; neural network; YOLO; narration; SONIFICATION; PEOPLE;

D O I：

10.1145/3611314.3615902

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Situational awareness plays a critical role in daily life, enabling individuals to comprehend their surroundings, make informed decisions, and navigate safely. However, individuals with low vision or visual impairments face difficulties in perceiving their real or virtual environment. In order to address this challenge, we propose a 3D computer vision-based accessibility solution, empowered by object-detection and text-to-speech technology. Our application describes the visual content of a Web3D scene from the user's perspective through auditory channels, thereby enhancing situational awareness for individuals with visual impairments in virtual and physical environments. We conducted a user study of 44 participants to compare a set of algorithms for specific tasks, such as Search or Summarize, and assessed the effectiveness of our captioning algorithms based on user ratings of naturalness, correctness, and satisfaction. Our study results indicate positive subjective results in accessibility for both normal and visually-impaired subjects and also distinguish significant effects between the task and the captioning algorithm.

引用

页数：10

共 50 条

[21] Scalable 3D Captioning with Pretrained Models
Luo, Tiange
Rockwell, Chris
Lee, Honglak
Johnson, Justin
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[22] SEEING AND HEARING TOO: AUDIO REPRESENTATION FOR VIDEO CAPTIONING
Chuang, Shun-Po
Wan, Chia-Hung
Huang, Pang-Chi
Yang, Chi-Yu
Lee, Hung-Yi
2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 381 - 388
[23] FeatureCut: An Adaptive Data Augmentation for Automated Audio Captioning
Ye, Zhongjie
Wang, Yuqing
Wang, Helin
Yang, Dongchao
Zou, Yuexian
PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 313 - 318
[24] Enhancing Automated Audio Captioning via Large Language Models with Optimized Audio Encoding
Liu, Jizhong
Li, Gang
Zhang, Junbo
Dinkel, Heinrich
Wang, Yongqing
Yan, Zhiyong
Wang, Yujun
Bin Wang
INTERSPEECH 2024, 2024, : 1135 - 1139
[25] Captioning and Indian Sign Language as Accessibility Tools in Universal Design
Poothullil, John Mathew Martin
Sahasrabudhe, Sujit
Chavan, Prashant D.
Toppo, Deepak
SAGE OPEN, 2013, 3 (02): : 1 - 16
[26] Closed Captioning for Accessibility of Hard of Hearing People in Educational Environments
Revuelta Sanz, Pablo
Sanchez Pena, Jose Manuel
Jimenez Dorado, Javier
Mezcua, Belen Ruiz
PROCESAMIENTO DEL LENGUAJE NATURAL, 2008, (41): : 305 - 306
[27] A Transformer-based Audio Captioning Model with Keyword Estimation
Koizumi, Yuma
Masumura, Ryo
Nishida, Kyosuke
Yasuda, Masahiro
Saito, Shoichiro
INTERSPEECH 2020, 2020, : 1977 - 1981
[28] Automated audio captioning: an overview of recent progress and new challenges
Xinhao Mei
Xubo Liu
Mark D. Plumbley
Wenwu Wang
EURASIP Journal on Audio, Speech, and Music Processing, 2022
[29] Scene Graph with 3D Information for Change Captioning
Liao, Zeming
Huang, Qingbao
Liang, Yu
Fu, Mingyi
Cai, Yi
Li, Qing
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 5074 - 5082
[30] Enhance Temporal Relations in Audio Captioning with Sound Event Detection
Xie, Zeyu
Xu, Xuenan
Wu, Mengyue
Yu, Kai
INTERSPEECH 2023, 2023, : 4179 - 4183

← 1 2 3 4 5 →