Sign-based image criteria for social interaction visual question answering

被引:0
|
作者
Chuganskaya, Anfisa A. [1 ]
Kovalev, Alexey K. [2 ]
Panov, Aleksandr, I [2 ]
机构
[1] Russian Acad Sci, Fed Res Ctr Comp Sci & Control, 60th October Anniversary Prospect 9, Moscow 117312, Russia
[2] Artificial Intelligence Res Inst, Moscow Inst Phys & Technol, Kerchenskaya Str 1,B1, Moscow 117303, Russia
关键词
Visual question answering; social interaction; perception; action; scenario; sign-based world model;
D O I
10.1093/jigpal/jzae026
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
The multi-modal tasks have started to play a significant role in the research on artificial intelligence. A particular example of that domain is visual-linguistic tasks, such as visual question answering. The progress of modern machine learning systems is determined, among other things, by the data on which these systems are trained. Most modern visual question answering data sets contain limited type questions that can be answered either by directly accessing the image itself or by using external data. At the same time, insufficient attention is paid to the issues of social interactions between people, which limits the scope of visual question answering systems. In this paper, we propose criteria by which images suitable for social interaction visual question answering can be selected for composing such questions, based on psychological research. We believe this should serve the progress of visual question answering systems.
引用
收藏
页码:656 / 670
页数:15
相关论文
共 50 条
  • [1] A visual question answering model based on image captioning
    Zhou, Kun
    Liu, Qiongjie
    Zhao, Dexin
    MULTIMEDIA SYSTEMS, 2024, 30 (06)
  • [2] Visual question answering algorithm based on image caption
    Cai, Wenliang
    Qiu, Guoyong
    PROCEEDINGS OF 2019 IEEE 3RD INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2019), 2019, : 2076 - 2079
  • [3] Visual Question Answering for Intelligent Interaction
    Gao, Panpan
    Sun, Hanxu
    Chen, Gang
    Wang, Ruiquan
    Li, Minggang
    MOBILE INFORMATION SYSTEMS, 2022, 2022
  • [4] DCT Sign-based Robust Image Hashing
    Prungsinchai, Supakorn
    Khelifi, Fouad
    Bouridane, Ahmed
    2013 8TH INTERNATIONAL CONFERENCE FOR INTERNET TECHNOLOGY AND SECURED TRANSACTIONS (ICITST), 2013, : 401 - 405
  • [5] Image captioning improved visual question answering
    Himanshu Sharma
    Anand Singh Jalal
    Multimedia Tools and Applications, 2022, 81 : 34775 - 34796
  • [6] Image captioning improved visual question answering
    Sharma, Himanshu
    Jalal, Anand Singh
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (24) : 34775 - 34796
  • [7] Improving Visual Question Answering by Image Captioning
    Shao, Xiangjun
    Dong, Hongsong
    Wu, Guangsheng
    IEEE ACCESS, 2025, 13 : 46299 - 46311
  • [8] Compact Trilinear Interaction for Visual Question Answering
    Tuong Do
    Thanh-Toan Do
    Huy Tran
    Tjiputra, Erman
    Tran, Quang D.
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 392 - 401
  • [9] MoBVQA: A Modality based Medical Image Visual Question Answering System
    Lubna, A.
    Kalady, Saidalavi
    Lijiya, A.
    PROCEEDINGS OF THE 2019 IEEE REGION 10 CONFERENCE (TENCON 2019): TECHNOLOGY, KNOWLEDGE, AND SOCIETY, 2019, : 727 - 732
  • [10] Using similarity based image caption to aid visual question answering
    Kang, Joonseo
    Lim, Changwon
    KOREAN JOURNAL OF APPLIED STATISTICS, 2021, 34 (02) : 191 - 204