A Universal Quaternion Hypergraph Network for Multimodal Video Question Answering

被引:14
|
作者
Guo, Zhicheng [1 ]
Zhao, Jiaxuan [1 ]
Jiao, Licheng [1 ]
Liu, Xu [1 ]
Liu, Fang [1 ]
机构
[1] Xidian Univ, Int Res Ctr Intelligent Percept & Computat, Sch Artificial Intelligence, Key Lab Intelligent Percept & Image Understanding,, Xian 710071, Peoples R China
基金
中国国家自然科学基金;
关键词
Quaternions; Task analysis; Cognition; Visualization; Knowledge discovery; Feature extraction; Convolution; Video question answering; multimodal features; quaternion operations; hypergraph convolution;
D O I
10.1109/TMM.2021.3120544
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Fusion and interaction of multimodal features are essential for video question answering. Structural information composed of the relationships between different objects in videos is very complex, which restricts understanding and reasoning. In this paper, we propose a quaternion hypergraph network (QHGN) for multimodal video question answering, to simultaneously involve multimodal features and structural information. Since quaternion operations are suitable for multimodal interactions, four components of the quaternion vectors are applied to represent the multimodal features. Furthermore, we construct a hypergraph based on the visual objects detected in the video. Most importantly, the quaternion hypergraph convolution operator is theoretically derived to realize multimodal and relational reasoning. Question and candidate answers are embedded in quaternion space, and a Q & A reasoning module is creatively designed for selecting the answer accurately. Moreover, the unified framework can be extended to other video-text tasks with different quaternion decoders. Experimental evaluations on the TVQA dataset and DramaQA dataset show that our method achieves state-of-the-art performance.
引用
收藏
页码:38 / 49
页数:12
相关论文
共 50 条
  • [21] Frame Augmented Alternating Attention Network for Video Question Answering
    Zhang, Wenqiao
    Tang, Siliang
    Cao, Yanpeng
    Pu, Shiliang
    Wu, Fei
    Zhuang, Yueting
    IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (04) : 1032 - 1041
  • [22] Hierarchical Recurrent Contextual Attention Network for Video Question Answering
    Zhou, Fei
    Han, Yahong
    ARTIFICIAL INTELLIGENCE, CICAI 2022, PT II, 2022, 13605 : 280 - 290
  • [23] Hierarchical Representation Network With Auxiliary Tasks for Video Captioning and Video Question Answering
    Gao, Lianli
    Lei, Yu
    Zeng, Pengpeng
    Song, Jingkuan
    Wang, Meng
    Shen, Heng Tao
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 202 - 215
  • [24] Affective question answering on video
    Ruwa, Nelson
    Mao, Qirong
    Wang, Liangjun
    Gou, Jianping
    NEUROCOMPUTING, 2019, 363 : 125 - 139
  • [25] Multimodal Attention for Visual Question Answering
    Kodra, Lorena
    Mece, Elinda Kajo
    INTELLIGENT COMPUTING, VOL 1, 2019, 858 : 783 - 792
  • [26] MIMOQA: Multimodal Input Multimodal Output Question Answering
    Singh, Hrituraj
    Nasery, Anshul
    Mehta, Denil
    Agarwal, Aishwarya
    Lamba, Jatin
    Srinivasan, Balaji Vasan
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 5317 - 5332
  • [27] A multi-scale self-supervised hypergraph contrastive learning framework for video question answering
    Wang, Zheng
    Wu, Bin
    Ota, Kaoru
    Dong, Mianxiong
    Li, He
    NEURAL NETWORKS, 2023, 168 : 272 - 286
  • [28] Video Graph Transformer for Video Question Answering
    Xiao, Junbin
    Zhou, Pan
    Chua, Tat-Seng
    Yan, Shuicheng
    COMPUTER VISION, ECCV 2022, PT XXXVI, 2022, 13696 : 39 - 58
  • [29] Video Reference: A Video Question Answering Engine
    Gao, Lei
    Li, Guangda
    Zheng, Yan-Tao
    Hong, Richang
    Chua, Tat-Seng
    ADVANCES IN MULTIMEDIA MODELING, PROCEEDINGS, 2010, 5916 : 799 - +
  • [30] Hypergraph Convolutional Network for Multi-Hop Knowledge Base Question Answering (Student Abstract)
    Han, Jiale
    Cheng, Bo
    Wang, Xu
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 13801 - 13802