Utilizing a Dense Video Captioning Technique for Generating Image Descriptions of Comics for People with Visual Impairments

被引:1
|
作者
Kim, Suhyun [1 ]
Lee, Semin [1 ]
Kim, Kyungok [1 ]
Oh, Uran [1 ]
机构
[1] Ewha Womans Univ, Seoul, South Korea
关键词
comics; image description; dense video captioning; people with visual impairment;
D O I
10.1145/3640543.3645154
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
To improve the accessibility of visual figures, auto-generation of text description of individual images has been studied. However, it cannot be directly applied to comics as the descriptions can be redundant as similar scenes appear in a row. To address this issue, we propose generating the descriptions per group of related images and demonstrate how an dense captioning technique for videos can be utilized for this purpose and ways to improve its performance. To assess the effectiveness of our approach and to identify factors affecting the quality of text descriptions of comics, we conducted a preliminary study with 3 sighted evaluators and a main user study with 12 participants with visual impairments. The results show that text descriptions generated per group of images are perceived to be better than those generated per image in terms of accuracy, clarity, understandability, length, informativeness and preference for sighted groups, when annotator is human. In the same conditions, when the annotator is AI, it exhibited better performance in terms of length. Also, people with visual impairments prefer group descriptions because of conciseness, smooth connectivity of sentences, and non-repetitive features. Based on the findings, we provide design recommendations for generating accessible comic descriptions at a scale for blind users.
引用
收藏
页码:750 / 760
页数:11
相关论文
共 23 条
  • [1] Leveraging auxiliary image descriptions for dense video captioning
    Boran, Emre
    Erdem, Aykut
    Ikizler-Cinbis, Nazli
    Erdem, Erkut
    Madhyastha, Pranava
    Specia, Lucia
    PATTERN RECOGNITION LETTERS, 2021, 146 : 70 - 76
  • [2] Video2Commonsense: Generating Commonsense Descriptions to Enrich Video Captioning
    Fang, Zhiyuan
    Gokhale, Tejas
    Banerjee, Pratyay
    Baral, Chitta
    Yang, Yezhou
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 840 - 860
  • [3] Visual to Text: Survey of Image and Video Captioning
    Li, Sheng
    Tao, Zhiqiang
    Li, Kang
    Fu, Yun
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2019, 3 (04): : 297 - 312
  • [4] Image Descriptions' Limitations for People with Visual Impairments: Where Are We and Where Are We Going?
    Jandrey, Alessandra Helena
    Alcoba Ruiz, Duncan Dubugras
    Silveira, Milene Selbach
    PROCEEDINGS OF THE 20TH BRAZILIAN SYMPOSIUM ON HUMAN FACTORS IN COMPUTING SYSTEMS (IHC 2021), 2021,
  • [5] Split it Up: Allocentric Descriptions of Indoor Maps for People with Visual Impairments
    Anken, Julia
    Rosenthal, Danilo
    Mueller, Karin
    Jaworek, Gerhard
    Stiefelhagen, Rainer
    COMPUTERS HELPING PEOPLE WITH SPECIAL NEEDS, ICCHP-AAATE 2022, 2022, : 102 - 109
  • [6] Deep Visual-Semantic Alignments for Generating Image Descriptions
    Karpathy, Andrej
    Li Fei-Fei
    2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2015, : 3128 - 3137
  • [7] Deep Visual-Semantic Alignments for Generating Image Descriptions
    Karpathy, Andrej
    Li Fei-Fei
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (04) : 664 - 676
  • [8] Designing Product Descriptions for Supporting Independent Grocery Shopping of People with Visual Impairments
    Lee, Kyungyeon
    Park, Sohyeon
    Oh, Uran
    EXTENDED ABSTRACTS OF THE 2021 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS (CHI'21), 2021,
  • [9] "So What? What's That to Do With Me?" Expectations of People With Visual Impairments for Image Descriptions in Their Personal Photo Activities
    Jung, Ju Yeon
    Steinberger, Tom
    Kim, Junbeom
    Ackerman, Mark S.
    PROCEEDINGS OF THE 2022 ACM DESIGNING INTERACTIVE SYSTEMS CONFERENCE, DIS 2022, 2022, : 1893 - 1906
  • [10] AuDIVA: A tool for embedding Audio Descriptions to enhance Video Accessibility for Persons with Visual Impairments
    Pantula, Muralidhar
    Kuppusamy, K. S.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (14) : 20005 - 20018