From Representation to Reasoning: Towards both Evidence and Commonsense Reasoning for Video Question-Answering

被引:17
|
作者
Li, Jiangtong [1 ]
Niu, Li [1 ]
Zhang, Liqing [1 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, MoE Key Lab Artificial Intelligence, Shanghai, Peoples R China
基金
国家重点研发计划; 美国国家科学基金会;
关键词
D O I
10.1109/CVPR52688.2022.02059
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video understanding has achieved great success in representation learning, such as video caption, video object grounding, and video descriptive question-answer. However, current methods still struggle on video reasoning, including evidence reasoning and commonsense reasoning. To facilitate deeper video understanding towards video reasoning, we present the task of Causal-VidQA, which includes four types of questions ranging from scene description (description) to evidence reasoning (explanation) and commonsense reasoning (prediction and counterfactual). For commonsense reasoning, we set up a two-step solution by answering the question and providing a proper reason. Through extensive experiments on existing VideoQA methods, we find that the state-of-the-art methods are strong in descriptions but weak in reasoning. We hope that Causal-VidQA can guide the research of video understanding from representation learning to deeper reasoning. The dataset and related resources are available at https://github.com/bcmi/Causal-VidQA.git.
引用
收藏
页码:21241 / 21250
页数:10
相关论文
共 50 条
  • [11] Neural Reasoning, Fast and Slow, for Video Question Answering
    Thao Minh Le
    Vuong Le
    Venkatesh, Svetha
    Truyen Tran
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [12] Multimodal Graph Reasoning and Fusion for Video Question Answering
    Zhang, Shuai
    Wang, Xingfu
    Hawbani, Ammar
    Zhao, Liang
    Alsamhi, Saeed Hamood
    2022 IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS, TRUSTCOM, 2022, : 1410 - 1415
  • [13] Video Question Answering with Spatio-Temporal Reasoning
    Jang, Yunseok
    Song, Yale
    Kim, Chris Dongjoo
    Yu, Youngjae
    Kim, Youngjin
    Kim, Gunhee
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2019, 127 (10) : 1385 - 1412
  • [14] Instance-sequence reasoning for video question answering
    LIU Rui
    HAN Yahong
    Frontiers of Computer Science, 2022, 16 (06)
  • [15] Instance-sequence reasoning for video question answering
    Liu, Rui
    Han, Yahong
    FRONTIERS OF COMPUTER SCIENCE, 2022, 16 (06)
  • [16] Instance-sequence reasoning for video question answering
    Rui Liu
    Yahong Han
    Frontiers of Computer Science, 2022, 16
  • [17] Reasoning with Heterogeneous Graph Alignment for Video Question Answering
    Jiang, Pin
    Han, Yahong
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 11109 - 11116
  • [18] Video Question Answering with Spatio-Temporal Reasoning
    Yunseok Jang
    Yale Song
    Chris Dongjoo Kim
    Youngjae Yu
    Youngjin Kim
    Gunhee Kim
    International Journal of Computer Vision, 2019, 127 : 1385 - 1412
  • [19] Language-based reasoning graph neural network for commonsense question answering
    Yang, Meng
    Wang, Yihao
    Gu, Yu
    NEURAL NETWORKS, 2025, 181
  • [20] JointLK: Joint Reasoning with Language Models and Knowledge Graphs for Commonsense Question Answering
    Sun, Yueqing
    Shi, Qi
    Qi, Le
    Zhang, Yu
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 5049 - 5060