IncreasingWeb3D Accessibility with Audio Captioning

被引：0

作者：

Polys, Nicholas F. ^{[1
]}

Wasi, Sheeban ^{[1
]}

机构：

[1] Virginia Tech, Blacksburg, VA 24061 USA

来源：

28TH INTERNATIONAL CONFERENCE ON WEB3D TECHNOLOGY, WEB3D 2023 | 2023年

关键词：

user study; neural network; YOLO; narration; SONIFICATION; PEOPLE;

D O I：

10.1145/3611314.3615902

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Situational awareness plays a critical role in daily life, enabling individuals to comprehend their surroundings, make informed decisions, and navigate safely. However, individuals with low vision or visual impairments face difficulties in perceiving their real or virtual environment. In order to address this challenge, we propose a 3D computer vision-based accessibility solution, empowered by object-detection and text-to-speech technology. Our application describes the visual content of a Web3D scene from the user's perspective through auditory channels, thereby enhancing situational awareness for individuals with visual impairments in virtual and physical environments. We conducted a user study of 44 participants to compare a set of algorithms for specific tasks, such as Search or Summarize, and assessed the effectiveness of our captioning algorithms based on user ratings of naturalness, correctness, and satisfaction. Our study results indicate positive subjective results in accessibility for both normal and visually-impaired subjects and also distinguish significant effects between the task and the captioning algorithm.

引用

页数：10

共 50 条

[1] AUDIO DIFFERENCE LEARNING FOR AUDIO CAPTIONING
Komatsu, Tatsuya
Fujita, Yusuke
Takeda, Kazuya
Toda, Tomoki
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 1456 - 1460
[2] TRAINING AUDIO CAPTIONING MODELS WITHOUT AUDIO
Deshmukh, Soham
Elizalde, Benjamin
Emmanouilidou, Dimitra
Raj, Bhiksha
Singh, Rita
Wang, Huaming
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 371 - 375
[3] CLOTHO: AN AUDIO CAPTIONING DATASET
Drossos, Konstantinos
Lipping, Samuel
Virtanen, Tuomas
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 736 - 740
[4] Audio Captioning Based on Combined Audio and Semantic Embeddings
Eren, Aysegul Ozkaya
Sert, Mustafa
2020 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM 2020), 2020, : 41 - 48
[5] Graph Attention for Automated Audio Captioning
Xiao, Feiyang
Guan, Jian
Zhu, Qiaoxi
Wang, Wenwu
IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 413 - 417
[6] Automated Audio Captioning With Topic Modeling
Eren, Aysegul Ozkaya
Sert, Mustafa
IEEE ACCESS, 2023, 11 : 4983 - 4991
[7] Joint speech recognition and audio captioning
Carnegie Mellon University, United States
不详
arXiv, 1600,
[8] JOINT SPEECH RECOGNITION AND AUDIO CAPTIONING
Narisetty, Chaitanya
Tsunoo, Emiru
Chang, Xuankai
Kashiwagi, Yosuke
Hentschel, Michael
Watanabe, Shinji
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7892 - 7896
[9] DIVERSE AUDIO CAPTIONING VIA ADVERSARIAL TRAINING
Mei, Xinhao
Liu, Xubo
Sun, Jianyuan
Plumbley, Mark D.
Wang, Wenwu
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8882 - 8886
[10] RECAP: RETRIEVAL-AUGMENTED AUDIO CAPTIONING
Ghosh, Sreyan
Kumar, Sonal
Evuru, Chandra Kiran Reddy
Duraiswami, Ramani
Manocha, Dinesh
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 1161 - 1165

← 1 2 3 4 5 →