Sign-based image criteria for social interaction visual question answering

被引:0
|
作者
Chuganskaya, Anfisa A. [1 ]
Kovalev, Alexey K. [2 ]
Panov, Aleksandr, I [2 ]
机构
[1] Russian Acad Sci, Fed Res Ctr Comp Sci & Control, 60th October Anniversary Prospect 9, Moscow 117312, Russia
[2] Artificial Intelligence Res Inst, Moscow Inst Phys & Technol, Kerchenskaya Str 1,B1, Moscow 117303, Russia
关键词
Visual question answering; social interaction; perception; action; scenario; sign-based world model;
D O I
10.1093/jigpal/jzae026
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
The multi-modal tasks have started to play a significant role in the research on artificial intelligence. A particular example of that domain is visual-linguistic tasks, such as visual question answering. The progress of modern machine learning systems is determined, among other things, by the data on which these systems are trained. Most modern visual question answering data sets contain limited type questions that can be answered either by directly accessing the image itself or by using external data. At the same time, insufficient attention is paid to the issues of social interactions between people, which limits the scope of visual question answering systems. In this paper, we propose criteria by which images suitable for social interaction visual question answering can be selected for composing such questions, based on psychological research. We believe this should serve the progress of visual question answering systems.
引用
收藏
页码:656 / 670
页数:15
相关论文
共 50 条
  • [21] COIN: Counterfactual Image Generation for Visual Question Answering Interpretation
    Boukhers, Zeyd
    Hartmann, Timo
    Juerjens, Jan
    SENSORS, 2022, 22 (06)
  • [22] Enhancing Image Comprehension for Computer Science Visual Question Answering
    Wang, Hongyu
    Qiang, Pengpeng
    Tan, Hongye
    Hu, Jingchang
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT I, 2024, 14425 : 487 - 498
  • [23] Leveraging Visual Question Answering for Image-Caption Ranking
    Lin, Xiao
    Parikh, Devi
    COMPUTER VISION - ECCV 2016, PT II, 2016, 9906 : 261 - 277
  • [24] Cross-attention Based Text-image Transformer for Visual Question Answering
    Rezapour M.
    Recent Advances in Computer Science and Communications, 2024, 17 (04) : 72 - 78
  • [25] Towards a Sign-Based Indoor Navigation System for People with Visual Impairments
    Rituerto, Alejandro
    Fusco, Giovanni
    Coughlan, James M.
    ASSETS'16: PROCEEDINGS OF THE 18TH INTERNATIONAL ACM SIGACCESS CONFERENCE ON COMPUTERS AND ACCESSIBILITY, 2016, : 287 - 288
  • [26] Visual question answering model based on visual relationship detection
    Xi, Yuling
    Zhang, Yanning
    Ding, Songtao
    Wan, Shaohua
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2020, 80
  • [27] QAlayout: Question Answering Layout Based on Multimodal Attention for Visual Question Answering on Corporate Document
    Mahamoud, Ibrahim Souleiman
    Coustaty, Mickael
    Joseph, Aurelie
    d'Andecy, Vincent Poulain
    Ogier, Jean-Marc
    DOCUMENT ANALYSIS SYSTEMS, DAS 2022, 2022, 13237 : 659 - 673
  • [28] Dual Attention and Question Categorization-Based Visual Question Answering
    Mishra A.
    Anand A.
    Guha P.
    IEEE Transactions on Artificial Intelligence, 2023, 4 (01): : 81 - 91
  • [29] FVQA: Fact-Based Visual Question Answering
    Wang, Peng
    Wu, Qi
    Shen, Chunhua
    Dick, Anthony
    van den Hengel, Anton
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (10) : 2413 - 2427
  • [30] Visual Question Answering Method Based on Yes/No Feedback
    Deng W.
    Wang J.
    Jin G.
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2020, 33 (11): : 1043 - 1053