From Representation to Reasoning: Towards both Evidence and Commonsense Reasoning for Video Question-Answering

被引:17
|
作者
Li, Jiangtong [1 ]
Niu, Li [1 ]
Zhang, Liqing [1 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, MoE Key Lab Artificial Intelligence, Shanghai, Peoples R China
基金
国家重点研发计划; 美国国家科学基金会;
关键词
D O I
10.1109/CVPR52688.2022.02059
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video understanding has achieved great success in representation learning, such as video caption, video object grounding, and video descriptive question-answer. However, current methods still struggle on video reasoning, including evidence reasoning and commonsense reasoning. To facilitate deeper video understanding towards video reasoning, we present the task of Causal-VidQA, which includes four types of questions ranging from scene description (description) to evidence reasoning (explanation) and commonsense reasoning (prediction and counterfactual). For commonsense reasoning, we set up a two-step solution by answering the question and providing a proper reason. Through extensive experiments on existing VideoQA methods, we find that the state-of-the-art methods are strong in descriptions but weak in reasoning. We hope that Causal-VidQA can guide the research of video understanding from representation learning to deeper reasoning. The dataset and related resources are available at https://github.com/bcmi/Causal-VidQA.git.
引用
收藏
页码:21241 / 21250
页数:10
相关论文
共 50 条
  • [21] Joint Answering and Explanation for Visual Commonsense Reasoning
    Li, Zhenyang
    Guo, Yangyang
    Wang, Kejie
    Wei, Yinwei
    Nie, Liqiang
    Kankanhalli, Mohan
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 3836 - 3846
  • [22] MoReVQA: Exploring Modular Reasoning Models for Video Question Answering
    Min, Juhong
    Buchl, Shyamal
    Nagrani, Arsha
    Cho, Minsu
    Schm, Cordelia
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 13235 - 13245
  • [23] Explore Multi-Step Reasoning in Video Question Answering
    Han, Yahong
    PROCEEDINGS OF THE 1ST WORKSHOP AND CHALLENGE ON COMPREHENSIVE VIDEO UNDERSTANDING IN THE WILD (COVIEW'18), 2018, : 5 - 5
  • [24] Explore Multi-Step Reasoning in Video Question Answering
    Song, Xiaomeng
    Shi, Yucheng
    Chen, Xin
    Han, Yahong
    PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 239 - 247
  • [25] Collaborative Aware Bidirectional Semantic Reasoning for Video Question Answering
    Wu, Xize
    Wu, Jiasong
    Zhu, Lei
    Senhadji, Lotfi
    Shu, Huazhong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (03) : 2074 - 2086
  • [26] Graph-Based Reasoning over Heterogeneous External Knowledge for Commonsense Question Answering
    Lv, Shangwen
    Guo, Daya
    Xu, Jingjing
    Tang, Duyu
    Duan, Nan
    Gong, Ming
    Shou, Linjun
    Jiang, Daxin
    Cao, Guihong
    Hu, Songlin
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 8449 - 8456
  • [27] Towards Reasoning Ability in Scene Text Visual Question Answering
    Wang, Qingqing
    Xiao, Liqiang
    Lu, Yue
    Jin, Yaohui
    He, Hao
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 2281 - 2289
  • [28] ON THE REPRESENTATION OF COMMONSENSE KNOWLEDGE BY POSSIBILISTIC REASONING
    YAGER, RR
    INTERNATIONAL JOURNAL OF MAN-MACHINE STUDIES, 1989, 31 (05): : 587 - 610
  • [29] Dynamic Multistep Reasoning based on Video Scene Graph for Video Question Answering
    Mao, Jianguo
    Jiang, Wenbin
    Wang, Xiangdong
    Feng, Zhifan
    Lyu, Yajuan
    Liu, Hong
    Zhu, Yong
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 3894 - 3904
  • [30] External Commonsense Knowledge as a Modality for Social Intelligence Question-Answering
    Natu, Sanika
    Sural, Shounak
    Sarkar, Sulagna
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 3036 - 3042