From Representation to Reasoning: Towards both Evidence and Commonsense Reasoning for Video Question-Answering

被引:17
|
作者
Li, Jiangtong [1 ]
Niu, Li [1 ]
Zhang, Liqing [1 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, MoE Key Lab Artificial Intelligence, Shanghai, Peoples R China
基金
国家重点研发计划; 美国国家科学基金会;
关键词
D O I
10.1109/CVPR52688.2022.02059
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video understanding has achieved great success in representation learning, such as video caption, video object grounding, and video descriptive question-answer. However, current methods still struggle on video reasoning, including evidence reasoning and commonsense reasoning. To facilitate deeper video understanding towards video reasoning, we present the task of Causal-VidQA, which includes four types of questions ranging from scene description (description) to evidence reasoning (explanation) and commonsense reasoning (prediction and counterfactual). For commonsense reasoning, we set up a two-step solution by answering the question and providing a proper reason. Through extensive experiments on existing VideoQA methods, we find that the state-of-the-art methods are strong in descriptions but weak in reasoning. We hope that Causal-VidQA can guide the research of video understanding from representation learning to deeper reasoning. The dataset and related resources are available at https://github.com/bcmi/Causal-VidQA.git.
引用
收藏
页码:21241 / 21250
页数:10
相关论文
共 50 条
  • [31] Learning to Explain: Datasets and Models for Identifying Valid Reasoning Chains in Multihop Question-Answering
    Jhamtani, Harsh
    Clark, Peter
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 137 - 150
  • [32] Differentiated Attention with Multi-modal Reasoning for Video Question Answering
    Yao, Shentao
    Li, Kun
    Xing, Kun
    Wu, Kewei
    Xie, Zhao
    Guo, Dan
    2022 IEEE INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING, BIG DATA AND ALGORITHMS (EEBDA), 2022, : 525 - 530
  • [33] Inferential Knowledge-Enhanced Integrated Reasoning for Video Question Answering
    Mao, Jianguo
    Jiang, Wenbin
    Liu, Hong
    Wang, Xiangdong
    Lyu, Yajuan
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 11, 2023, : 13380 - 13388
  • [34] Discovering the Real Association: Multimodal Causal Reasoning in Video Question Answering
    Zang, Chuanqi
    Wang, Hanqing
    Pei, Mingtao
    Liang, Wei
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 19027 - 19036
  • [35] Graph-based relational reasoning network for video question answering
    Tan, Tao
    Sun, Guanglu
    MACHINE VISION AND APPLICATIONS, 2025, 36 (01)
  • [36] Fillers as Signals: Evidence From a Question-Answering Paradigm
    Walker, Esther J.
    Risko, Evan F.
    Kingstone, Alan
    DISCOURSE PROCESSES, 2014, 51 (03) : 264 - 286
  • [37] TOWARDS A THEORY OF COMMONSENSE VISUAL REASONING
    CHANDRASEKARAN, B
    NARAYANAN, NH
    LECTURE NOTES IN COMPUTER SCIENCE, 1990, 472 : 387 - 409
  • [38] INDEPENDENCE OF QUESTION-ANSWERING STRATEGY AND SEARCHED REPRESENTATION
    SINGER, M
    MEMORY & COGNITION, 1991, 19 (02) : 189 - 196
  • [39] EGLR: Two-staged Explanation Generation and Language Reasoning framework for commonsense question answering
    Liu, Wei
    Huang, Zheng
    Wang, Chao
    Peng, Yan
    Xie, Shaorong
    KNOWLEDGE-BASED SYSTEMS, 2024, 286
  • [40] Testing the reasoning for question answering validation
    Penas, Anselmo
    Rodrigo, Alvaro
    Sama, Valentin
    Verdejo, Felisa
    JOURNAL OF LOGIC AND COMPUTATION, 2008, 18 (03) : 459 - 474