Utilizing a Dense Video Captioning Technique for Generating Image Descriptions of Comics for People with Visual Impairments

被引:1
|
作者
Kim, Suhyun [1 ]
Lee, Semin [1 ]
Kim, Kyungok [1 ]
Oh, Uran [1 ]
机构
[1] Ewha Womans Univ, Seoul, South Korea
关键词
comics; image description; dense video captioning; people with visual impairment;
D O I
10.1145/3640543.3645154
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
To improve the accessibility of visual figures, auto-generation of text description of individual images has been studied. However, it cannot be directly applied to comics as the descriptions can be redundant as similar scenes appear in a row. To address this issue, we propose generating the descriptions per group of related images and demonstrate how an dense captioning technique for videos can be utilized for this purpose and ways to improve its performance. To assess the effectiveness of our approach and to identify factors affecting the quality of text descriptions of comics, we conducted a preliminary study with 3 sighted evaluators and a main user study with 12 participants with visual impairments. The results show that text descriptions generated per group of images are perceived to be better than those generated per image in terms of accuracy, clarity, understandability, length, informativeness and preference for sighted groups, when annotator is human. In the same conditions, when the annotator is AI, it exhibited better performance in terms of length. Also, people with visual impairments prefer group descriptions because of conciseness, smooth connectivity of sentences, and non-repetitive features. Based on the findings, we provide design recommendations for generating accessible comic descriptions at a scale for blind users.
引用
收藏
页码:750 / 760
页数:11
相关论文
共 23 条