Dual-Branch Collaborative Learning for Visual Question Answering

被引:0
|
作者
Tian, Weidong [1 ,2 ]
Zhao, Junxiang [1 ]
Xu, Wenzheng [1 ]
Zhao, Zhongqiu [1 ,2 ,3 ]
机构
[1] Hefei Univ Technol, Sch Comp Sci & Informat Engn, Hefei, Peoples R China
[2] HFUT, Intelligent Mfg Inst, Hefei, Peoples R China
[3] Guangxi Acad Sci, Nanning, Guangxi, Peoples R China
基金
中国国家自然科学基金;
关键词
VQA; Relational Reasoning; Attention; Collaborative learning;
D O I
10.1007/978-981-97-5588-2_9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Good visual question answering models can reason about the underlying relationships in the context of images and questions. Recently, some works have used graph-based methods for visual reasoning, but graph-based methods cannot perform better reasoning when the connection between the question statement and the visual object is unclear. In this paper, we design a dual-branch network based on collaborative learning that can simultaneously focus on relational reasoning and attention-based deep alignment between images and questions. The question-aware enhancement module we designed can better utilize question information, and the joint prediction module we designed can fully integrate the performance of the two branches. Extensive experimental results demonstrate that our proposed method outperforms the current state-of-the-art methods in terms of performance.
引用
收藏
页码:96 / 107
页数:12
相关论文
共 50 条
  • [21] A Survey on Representation Learning in Visual Question Answering
    Sahani, Manish
    Singh, Priyadarshan
    Jangpangi, Sachin
    Kumar, Shailender
    MACHINE LEARNING AND BIG DATA ANALYTICS (PROCEEDINGS OF INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND BIG DATA ANALYTICS (ICMLBDA) 2021), 2022, 256 : 326 - 336
  • [22] Multimodal Learning and Reasoning for Visual Question Answering
    Ilievski, Ilija
    Feng, Jiashi
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [23] Visual Question Answering as a Meta Learning Task
    Teney, Damien
    van den Hengel, Anton
    COMPUTER VISION - ECCV 2018, PT 15, 2018, 11219 : 229 - 245
  • [24] Selective residual learning for Visual Question Answering
    Hong, Jongkwang
    Park, Sungho
    Byun, Hyeran
    NEUROCOMPUTING, 2020, 402 : 366 - 374
  • [25] Dual Attention and Question Categorization-Based Visual Question Answering
    Mishra A.
    Anand A.
    Guha P.
    IEEE Transactions on Artificial Intelligence, 2023, 4 (01): : 81 - 91
  • [26] Dual modality prompt learning for visual question-grounded answering in robotic surgery
    Zhang, Yue
    Fan, Wanshu
    Peng, Peixi
    Yang, Xin
    Zhou, Dongsheng
    Wei, Xiaopeng
    VISUAL COMPUTING FOR INDUSTRY BIOMEDICINE AND ART, 2024, 7 (01)
  • [27] Local dual-branch attention feature learning framework from UAVs for visual defect detection
    Xu, Jianbing
    Zhou, Jiangxin
    Xu, Dongxu
    Chen, Yu
    VISUAL COMPUTER, 2025,
  • [28] Addressing domain discrepancy: A dual-branch collaborative model to unsupervised dehazing
    Fan, Shuaibin
    Xue, Minglong
    Ning, Aoxiang
    Zhong, Senming
    PATTERN RECOGNITION LETTERS, 2025, 189 : 150 - 156
  • [29] Learning Visual Knowledge Memory Networks for Visual Question Answering
    Su, Zhou
    Zhu, Chen
    Dong, Yinpeng
    Cai, Dongqi
    Chen, Yurong
    Li, Jianguo
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7736 - 7745
  • [30] Visual-Semantic Dual Channel Network for Visual Question Answering
    Wang, Xin
    Chen, Qiaohong
    Hu, Ting
    Sun, Qi
    Jia, Yubo
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,