A Universal Quaternion Hypergraph Network for Multimodal Video Question Answering

被引:14
|
作者
Guo, Zhicheng [1 ]
Zhao, Jiaxuan [1 ]
Jiao, Licheng [1 ]
Liu, Xu [1 ]
Liu, Fang [1 ]
机构
[1] Xidian Univ, Int Res Ctr Intelligent Percept & Computat, Sch Artificial Intelligence, Key Lab Intelligent Percept & Image Understanding,, Xian 710071, Peoples R China
基金
中国国家自然科学基金;
关键词
Quaternions; Task analysis; Cognition; Visualization; Knowledge discovery; Feature extraction; Convolution; Video question answering; multimodal features; quaternion operations; hypergraph convolution;
D O I
10.1109/TMM.2021.3120544
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Fusion and interaction of multimodal features are essential for video question answering. Structural information composed of the relationships between different objects in videos is very complex, which restricts understanding and reasoning. In this paper, we propose a quaternion hypergraph network (QHGN) for multimodal video question answering, to simultaneously involve multimodal features and structural information. Since quaternion operations are suitable for multimodal interactions, four components of the quaternion vectors are applied to represent the multimodal features. Furthermore, we construct a hypergraph based on the visual objects detected in the video. Most importantly, the quaternion hypergraph convolution operator is theoretically derived to realize multimodal and relational reasoning. Question and candidate answers are embedded in quaternion space, and a Q & A reasoning module is creatively designed for selecting the answer accurately. Moreover, the unified framework can be extended to other video-text tasks with different quaternion decoders. Experimental evaluations on the TVQA dataset and DramaQA dataset show that our method achieves state-of-the-art performance.
引用
收藏
页码:38 / 49
页数:12
相关论文
共 50 条
  • [1] Multimodal Graph Reasoning and Fusion for Video Question Answering
    Zhang, Shuai
    Wang, Xingfu
    Hawbani, Ammar
    Zhao, Liang
    Alsamhi, Saeed Hamood
    2022 IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS, TRUSTCOM, 2022, : 1410 - 1415
  • [2] Complementary spatiotemporal network for video question answering
    Xinrui Li
    Aming Wu
    Yahong Han
    Multimedia Systems, 2022, 28 : 161 - 169
  • [3] Complementary spatiotemporal network for video question answering
    Li, Xinrui
    Wu, Aming
    Han, Yahong
    MULTIMEDIA SYSTEMS, 2022, 28 (01) : 161 - 169
  • [4] Hierarchical Conditional Relation Networks for Multimodal Video Question Answering
    Le, Thao Minh
    Le, Vuong
    Venkatesh, Svetha
    Tran, Truyen
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2021, 129 (11) : 3027 - 3050
  • [5] Multimodal Dual Attention Memory for Video Story Question Answering
    Kim, Kyung-Min
    Choi, Seong-Ho
    Kim, Jin-Hwa
    Zhang, Byoung-Tak
    COMPUTER VISION - ECCV 2018, PT 15, 2018, 11219 : 698 - 713
  • [6] Hierarchical Conditional Relation Networks for Multimodal Video Question Answering
    Thao Minh Le
    Vuong Le
    Svetha Venkatesh
    Truyen Tran
    International Journal of Computer Vision, 2021, 129 : 3027 - 3050
  • [7] Adversarial Multimodal Network for Movie Story Question Answering
    Yuan, Zhaoquan
    Sun, Siyuan
    Duan, Lixin
    Li, Changsheng
    Wu, Xiao
    Xu, Changsheng
    IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 1744 - 1756
  • [8] Conditional Cross Correlation Network for Video Question Answering
    Ouenniche, Kaouther
    Tapu, Ruxandra
    Zaharia, Titus
    2023 IEEE 17TH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING, ICSC, 2023, : 25 - 32
  • [9] Pairwise VLAD Interaction Network for Video Question Answering
    Wang, Hui
    Guo, Dan
    Hua, Xian-Sheng
    Wang, Meng
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 5119 - 5127
  • [10] Video Question Answering Scheme Base on Multimodal Knowledge Active Learning
    Liu M.
    Wang R.
    Zhou F.
    Lin G.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2024, 61 (04): : 889 - 902