Sign-based image criteria for social interaction visual question answering

被引：0

作者：

Chuganskaya, Anfisa A. ^{[1
]}

Kovalev, Alexey K. ^{[2
]}

Panov, Aleksandr, I ^{[2
]}

机构：

[1] Russian Acad Sci, Fed Res Ctr Comp Sci & Control, 60th October Anniversary Prospect 9, Moscow 117312, Russia

[2] Artificial Intelligence Res Inst, Moscow Inst Phys & Technol, Kerchenskaya Str 1,B1, Moscow 117303, Russia

来源：

LOGIC JOURNAL OF THE IGPL | 2024年 / 32卷 / 04期

关键词：

Visual question answering; social interaction; perception; action; scenario; sign-based world model;

D O I：

10.1093/jigpal/jzae026

中图分类号：

O29 [应用数学];

学科分类号：

070104 ;

摘要：

The multi-modal tasks have started to play a significant role in the research on artificial intelligence. A particular example of that domain is visual-linguistic tasks, such as visual question answering. The progress of modern machine learning systems is determined, among other things, by the data on which these systems are trained. Most modern visual question answering data sets contain limited type questions that can be answered either by directly accessing the image itself or by using external data. At the same time, insufficient attention is paid to the issues of social interactions between people, which limits the scope of visual question answering systems. In this paper, we propose criteria by which images suitable for social interaction visual question answering can be selected for composing such questions, based on psychological research. We believe this should serve the progress of visual question answering systems.

引用

页码：656 / 670

页数：15

共 50 条

[1] A visual question answering model based on image captioning
Zhou, Kun
Liu, Qiongjie
Zhao, Dexin
MULTIMEDIA SYSTEMS, 2024, 30 (06)
[2] Visual question answering algorithm based on image caption
Cai, Wenliang
Qiu, Guoyong
PROCEEDINGS OF 2019 IEEE 3RD INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2019), 2019, : 2076 - 2079
[3] Visual Question Answering for Intelligent Interaction
Gao, Panpan
Sun, Hanxu
Chen, Gang
Wang, Ruiquan
Li, Minggang
MOBILE INFORMATION SYSTEMS, 2022, 2022
[4] DCT Sign-based Robust Image Hashing
Prungsinchai, Supakorn
Khelifi, Fouad
Bouridane, Ahmed
2013 8TH INTERNATIONAL CONFERENCE FOR INTERNET TECHNOLOGY AND SECURED TRANSACTIONS (ICITST), 2013, : 401 - 405
[5] Image captioning improved visual question answering
Himanshu Sharma
Anand Singh Jalal
Multimedia Tools and Applications, 2022, 81 : 34775 - 34796
[6] Image captioning improved visual question answering
Sharma, Himanshu
Jalal, Anand Singh
MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (24) : 34775 - 34796
[7] Improving Visual Question Answering by Image Captioning
Shao, Xiangjun
Dong, Hongsong
Wu, Guangsheng
IEEE ACCESS, 2025, 13 : 46299 - 46311
[8] Compact Trilinear Interaction for Visual Question Answering
Tuong Do
Thanh-Toan Do
Huy Tran
Tjiputra, Erman
Tran, Quang D.
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 392 - 401
[9] MoBVQA: A Modality based Medical Image Visual Question Answering System
Lubna, A.
Kalady, Saidalavi
Lijiya, A.
PROCEEDINGS OF THE 2019 IEEE REGION 10 CONFERENCE (TENCON 2019): TECHNOLOGY, KNOWLEDGE, AND SOCIETY, 2019, : 727 - 732
[10] Using similarity based image caption to aid visual question answering
Kang, Joonseo
Lim, Changwon
KOREAN JOURNAL OF APPLIED STATISTICS, 2021, 34 (02) : 191 - 204

← 1 2 3 4 5 →