Sign-based image criteria for social interaction visual question answering

被引：0

作者：

Chuganskaya, Anfisa A. ^{[1
]}

Kovalev, Alexey K. ^{[2
]}

Panov, Aleksandr, I ^{[2
]}

机构：

[1] Russian Acad Sci, Fed Res Ctr Comp Sci & Control, 60th October Anniversary Prospect 9, Moscow 117312, Russia

[2] Artificial Intelligence Res Inst, Moscow Inst Phys & Technol, Kerchenskaya Str 1,B1, Moscow 117303, Russia

来源：

LOGIC JOURNAL OF THE IGPL | 2024年 / 32卷 / 04期

关键词：

Visual question answering; social interaction; perception; action; scenario; sign-based world model;

D O I：

10.1093/jigpal/jzae026

中图分类号：

O29 [应用数学];

学科分类号：

070104 ;

摘要：

The multi-modal tasks have started to play a significant role in the research on artificial intelligence. A particular example of that domain is visual-linguistic tasks, such as visual question answering. The progress of modern machine learning systems is determined, among other things, by the data on which these systems are trained. Most modern visual question answering data sets contain limited type questions that can be answered either by directly accessing the image itself or by using external data. At the same time, insufficient attention is paid to the issues of social interactions between people, which limits the scope of visual question answering systems. In this paper, we propose criteria by which images suitable for social interaction visual question answering can be selected for composing such questions, based on psychological research. We believe this should serve the progress of visual question answering systems.

引用

页码：656 / 670

页数：15

共 50 条

[21] COIN: Counterfactual Image Generation for Visual Question Answering Interpretation
Boukhers, Zeyd
Hartmann, Timo
Juerjens, Jan
SENSORS, 2022, 22 (06)
[22] Enhancing Image Comprehension for Computer Science Visual Question Answering
Wang, Hongyu
Qiang, Pengpeng
Tan, Hongye
Hu, Jingchang
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT I, 2024, 14425 : 487 - 498
[23] Leveraging Visual Question Answering for Image-Caption Ranking
Lin, Xiao
Parikh, Devi
COMPUTER VISION - ECCV 2016, PT II, 2016, 9906 : 261 - 277
[24] Cross-attention Based Text-image Transformer for Visual Question Answering
Rezapour M.
Recent Advances in Computer Science and Communications, 2024, 17 (04) : 72 - 78
[25] Towards a Sign-Based Indoor Navigation System for People with Visual Impairments
Rituerto, Alejandro
Fusco, Giovanni
Coughlan, James M.
ASSETS'16: PROCEEDINGS OF THE 18TH INTERNATIONAL ACM SIGACCESS CONFERENCE ON COMPUTERS AND ACCESSIBILITY, 2016, : 287 - 288
[26] Visual question answering model based on visual relationship detection
Xi, Yuling
Zhang, Yanning
Ding, Songtao
Wan, Shaohua
SIGNAL PROCESSING-IMAGE COMMUNICATION, 2020, 80
[27] QAlayout: Question Answering Layout Based on Multimodal Attention for Visual Question Answering on Corporate Document
Mahamoud, Ibrahim Souleiman
Coustaty, Mickael
Joseph, Aurelie
d'Andecy, Vincent Poulain
Ogier, Jean-Marc
DOCUMENT ANALYSIS SYSTEMS, DAS 2022, 2022, 13237 : 659 - 673
[28] Dual Attention and Question Categorization-Based Visual Question Answering
Mishra A.
Anand A.
Guha P.
IEEE Transactions on Artificial Intelligence, 2023, 4 (01): : 81 - 91
[29] FVQA: Fact-Based Visual Question Answering
Wang, Peng
Wu, Qi
Shen, Chunhua
Dick, Anthony
van den Hengel, Anton
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (10) : 2413 - 2427
[30] Visual Question Answering Method Based on Yes/No Feedback
Deng W.
Wang J.
Jin G.
Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2020, 33 (11): : 1043 - 1053

← 1 2 3 4 5 →