Video question answering via traffic knowledge database and question classification

被引:0
|
作者
Xiaoyong Sun
Yu Dai
Yuchen Wang
Weifeng Ma
Xuefen Lin
机构
[1] Zhejiang University of Science and Technology,School of Information and Electronic Engineering
来源
Multimedia Systems | 2024年 / 30卷
关键词
Video question answering; Knowledge; Transformer; Question classification;
D O I
暂无
中图分类号
学科分类号
摘要
Video question answering (VideoQA) is a task that involves answering questions related to videos. The main idea is to understand the content of the video and to combine it with the relevant semantic context to answer various types of questions. Existing methods typically analyze the spatiotemporal correlations of the entire video to answer questions. However, for some simple questions, the answer is related to only a specific frame of the video, and analyzing the entire video undoubtedly increases the learning cost. For some complex questions, the information contained in the video is limited, and these methods are not sufficient to fully answer such questions. Therefore, we proposes a VideoQA model based on question classification and a traffic knowledge database. The model starts from the perspective of the question and classifies the questions into general scene questions and causal questions using different methods to process these two types of questions. For general scene questions, we first extract the key frames of the video to convert it into a simpler image question-answering task and then we use top–down and bottom–up attention mechanisms to process it. For causal questions, we design a lightweight traffic knowledge database that provides relevant traffic knowledge not originally present in VideoQA datasets, to help model reasoning. Then, we use a question and knowledge-guided aggregation graph attention network to process causal questions. The experimental results show that while greatly reducing resource costs, our model performs better on the TrafficQA dataset than do models utilizing millions of external data for pretraining.
引用
收藏
相关论文
共 50 条
  • [31] Unifying the Video and Question Attentions for Open-Ended Video Question Answering
    Xue, Hongyang
    Zhao, Zhou
    Cai, Deng
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2017, 26 (12) : 5656 - 5666
  • [32] Video Question Answering on Screencast Tutorials
    Zhao, Wentian
    Kim, Seokhwan
    Xu, Ning
    Jin, Hailin
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 1061 - 1068
  • [33] Temporal knowledge graph question answering via subgraph reasoning
    Chen, Ziyang
    Zhao, Xiang
    Liao, Jinzhi
    Li, Xinyi
    Kanoulas, Evangelos
    KNOWLEDGE-BASED SYSTEMS, 2022, 251
  • [34] Video Question Answering by Frame Attention
    Fang, Jiannan
    Sun, Lingling
    Wang, Yaqi
    ELEVENTH INTERNATIONAL CONFERENCE ON DIGITAL IMAGE PROCESSING (ICDIP 2019), 2019, 11179
  • [35] Video Question Answering Scheme Base on Multimodal Knowledge Active Learning
    Liu M.
    Wang R.
    Zhou F.
    Lin G.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2024, 61 (04): : 889 - 902
  • [36] Video Question Answering with Procedural Programs
    Choudhury, Rohan
    Niinuma, Koichiro
    Kitani, Kris M.
    Jeni, Laszlo A.
    COMPUTER VISION-ECCV 2024, PT XXXVIII, 2025, 15096 : 315 - 332
  • [37] Video Question Answering With Prior Knowledge and Object-Sensitive Learning
    Zeng, Pengpeng
    Zhang, Haonan
    Gao, Lianli
    Song, Jingkuan
    Shen, Heng Tao
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 5936 - 5948
  • [38] Invariant Grounding for Video Question Answering
    Li, Yicong
    Wang, Xiang
    Xiao, Junbin
    Ji, Wei
    Chua, Tat-Seng
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 2918 - 2927
  • [39] BERT Representations for Video Question Answering
    Yang, Zekun
    Garcia, Noa
    Chu, Chenhui
    Otani, Mayu
    Nakashima, Yuta
    Takemura, Haruo
    2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 1545 - 1554
  • [40] Inferential Knowledge-Enhanced Integrated Reasoning for Video Question Answering
    Mao, Jianguo
    Jiang, Wenbin
    Liu, Hong
    Wang, Xiangdong
    Lyu, Yajuan
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 11, 2023, : 13380 - 13388