Utilizing a Dense Video Captioning Technique for Generating Image Descriptions of Comics for People with Visual Impairments

被引：1

作者：

Kim, Suhyun ^{[1
]}

Lee, Semin ^{[1
]}

Kim, Kyungok ^{[1
]}

Oh, Uran ^{[1
]}

机构：

[1] Ewha Womans Univ, Seoul, South Korea

来源：

PROCEEDINGS OF 2024 29TH ANNUAL CONFERENCE ON INTELLIGENT USER INTERFACES, IUI 2024 | 2024年

关键词：

comics; image description; dense video captioning; people with visual impairment;

D O I：

10.1145/3640543.3645154

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

To improve the accessibility of visual figures, auto-generation of text description of individual images has been studied. However, it cannot be directly applied to comics as the descriptions can be redundant as similar scenes appear in a row. To address this issue, we propose generating the descriptions per group of related images and demonstrate how an dense captioning technique for videos can be utilized for this purpose and ways to improve its performance. To assess the effectiveness of our approach and to identify factors affecting the quality of text descriptions of comics, we conducted a preliminary study with 3 sighted evaluators and a main user study with 12 participants with visual impairments. The results show that text descriptions generated per group of images are perceived to be better than those generated per image in terms of accuracy, clarity, understandability, length, informativeness and preference for sighted groups, when annotator is human. In the same conditions, when the annotator is AI, it exhibited better performance in terms of length. Also, people with visual impairments prefer group descriptions because of conciseness, smooth connectivity of sentences, and non-repetitive features. Based on the findings, we provide design recommendations for generating accessible comic descriptions at a scale for blind users.

引用

页码：750 / 760

页数：11

共 23 条

[21] The Eyes Have It: Visual Feedback Methods to Make Walking in Immersive Virtual Reality More Accessible for People With Mobility Impairments While Utilizing Head-Mounted Displays
Mahmud, M. Rasel
Cordova, Alberto
Quarles, John
PROCEEDINGS OF THE 25TH INTERNATIONAL ACM SIGACCESS CONFERENCE ON COMPUTERS AND ACCESSIBILITY, ASSETS 2023, 2023,
[22] Method of generating both an open shell and a solid model utilizing stereo-visual image pairs from multiple views of 3-D objects
Kitajima, Katsuhiro
Maki, Hirokazu
Systems and Computers in Japan, 1993, 24 (10) : 84 - 95
[23] A New and Resilient Image Encryption Technique Based on Pixel Manipulation, Value Transformation and Visual Transformation Utilizing Single-Level Haar Wavelet Transform
Seal, Arindrajit
Chakraborty, Shouvik
Mali, Kalyani
PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND COMMUNICATION, 2017, 458 : 603 - 611

← 1 2 3 →