Dual-Branch Collaborative Learning for Visual Question Answering

被引：0

作者：

Tian, Weidong ^{[1
,2
]}

Zhao, Junxiang ^{[1
]}

Xu, Wenzheng ^{[1
]}

Zhao, Zhongqiu ^{[1
,2
,3
]}

机构：

[1] Hefei Univ Technol, Sch Comp Sci & Informat Engn, Hefei, Peoples R China

[2] HFUT, Intelligent Mfg Inst, Hefei, Peoples R China

[3] Guangxi Acad Sci, Nanning, Guangxi, Peoples R China

来源：

ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT III, ICIC 2024 | 2024年 / 14864卷

基金：

中国国家自然科学基金;

关键词：

VQA; Relational Reasoning; Attention; Collaborative learning;

D O I：

10.1007/978-981-97-5588-2_9

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Good visual question answering models can reason about the underlying relationships in the context of images and questions. Recently, some works have used graph-based methods for visual reasoning, but graph-based methods cannot perform better reasoning when the connection between the question statement and the visual object is unclear. In this paper, we design a dual-branch network based on collaborative learning that can simultaneously focus on relational reasoning and attention-based deep alignment between images and questions. The question-aware enhancement module we designed can better utilize question information, and the joint prediction module we designed can fully integrate the performance of the two branches. Extensive experimental results demonstrate that our proposed method outperforms the current state-of-the-art methods in terms of performance.

引用

页码：96 / 107

页数：12

共 50 条

[21] A Survey on Representation Learning in Visual Question Answering
Sahani, Manish
Singh, Priyadarshan
Jangpangi, Sachin
Kumar, Shailender
MACHINE LEARNING AND BIG DATA ANALYTICS (PROCEEDINGS OF INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND BIG DATA ANALYTICS (ICMLBDA) 2021), 2022, 256 : 326 - 336
[22] Multimodal Learning and Reasoning for Visual Question Answering
Ilievski, Ilija
Feng, Jiashi
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
[23] Visual Question Answering as a Meta Learning Task
Teney, Damien
van den Hengel, Anton
COMPUTER VISION - ECCV 2018, PT 15, 2018, 11219 : 229 - 245
[24] Selective residual learning for Visual Question Answering
Hong, Jongkwang
Park, Sungho
Byun, Hyeran
NEUROCOMPUTING, 2020, 402 : 366 - 374
[25] Dual Attention and Question Categorization-Based Visual Question Answering
Mishra A.
Anand A.
Guha P.
IEEE Transactions on Artificial Intelligence, 2023, 4 (01): : 81 - 91
[26] Dual modality prompt learning for visual question-grounded answering in robotic surgery
Zhang, Yue
Fan, Wanshu
Peng, Peixi
Yang, Xin
Zhou, Dongsheng
Wei, Xiaopeng
VISUAL COMPUTING FOR INDUSTRY BIOMEDICINE AND ART, 2024, 7 (01)
[27] Local dual-branch attention feature learning framework from UAVs for visual defect detection
Xu, Jianbing
Zhou, Jiangxin
Xu, Dongxu
Chen, Yu
VISUAL COMPUTER, 2025,
[28] Addressing domain discrepancy: A dual-branch collaborative model to unsupervised dehazing
Fan, Shuaibin
Xue, Minglong
Ning, Aoxiang
Zhong, Senming
PATTERN RECOGNITION LETTERS, 2025, 189 : 150 - 156
[29] Learning Visual Knowledge Memory Networks for Visual Question Answering
Su, Zhou
Zhu, Chen
Dong, Yinpeng
Cai, Dongqi
Chen, Yurong
Li, Jianguo
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7736 - 7745
[30] Visual-Semantic Dual Channel Network for Visual Question Answering
Wang, Xin
Chen, Qiaohong
Hu, Ting
Sun, Qi
Jia, Yubo
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,

← 1 2 3 4 5 →