Query-aware Long Video Localization and Relation Discrimination for Deep Video Understanding

被引:0
|
作者
Xu, Yuanxing [1 ]
Wei, Yuting [1 ]
Wu, Bin [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Beijing, Peoples R China
来源
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023 | 2023年
基金
中国国家自然科学基金;
关键词
Deep video understanding; Multimodal analysis; Relation discrimination; Question answering;
D O I
10.1145/3581783.3612871
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The surge in video and social media content underscores the need for a deeper understanding of multimedia data. Most of the existing mature video understanding techniques perform well with short formats and content that requires only shallow understanding, but do not perform well with long format videos that require deep understanding and reasoning. Deep Video Understanding (DVU) Challenge aims to push the boundaries of multimodal extraction, fusion, and analytics to address the problem of holistically analyzing long videos and extract useful knowledge to solve different types of queries. This paper introduces a query-aware method for long video localization and relation discrimination, leveraging an image-language pretrained model. This model adeptly selects frames pertinent to queries, obviating the need for a complete movie-level knowledge graph. Our approach achieved first and fourth positions for two groups of movie-level queries. Sufficient experiments and final rankings demonstrate its effectiveness and robustness.
引用
收藏
页码:9591 / 9595
页数:5
相关论文
共 50 条
  • [21] Second International Workshop on Deep Video Understanding
    Curtis, Keith
    Awad, George
    Rajput, Shahzad
    Soborof, Ian
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2022, 2022, : 801 - 802
  • [22] PyTorchVideo: A Deep Learning Library for Video Understanding
    Fan, Haoqi
    Murrell, Tullie
    Wang, Heng
    Alwala, Kalyan Vasudev
    Li, Yanghao
    Li, Yilei
    Xiong, Bo
    Ravi, Nikhila
    Li, Meng
    Yang, Haichuan
    Malik, Jitendra
    Girshick, Ross
    Feiszli, Matt
    Adcock, Aaron
    Lo, Wan-Yen
    Feichtenhofer, Christoph
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 3783 - 3786
  • [23] Gaze Aware Deep Learning Model for Video Summarization
    Wu, Jiaxin
    Zhong, Sheng-hua
    Ma, Zheng
    Heinen, Stephen J.
    Jiang, Jianmin
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2018, PT II, 2018, 11165 : 285 - 295
  • [24] Motion-Aware Deep Video Coding Network
    Khan, Rida
    Liu, Ying
    BIG DATA II: LEARNING, ANALYTICS, AND APPLICATIONS, 2020, 11395
  • [25] Uncertainty-Aware Deep Video Compression With Ensembles
    Ma, Wufei
    Li, Jiahao
    Li, Bin
    Lu, Yan
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 7863 - 7872
  • [26] Budget-Aware Deep Semantic Video Segmentation
    Mahasseni, Behrooz
    Todorovic, Sinisa
    Fern, Alan
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 2077 - 2086
  • [27] PolyphonicFormer: Unified Query Learning for Depth-Aware Video Panoptic Segmentation
    Yuan, Haobo
    Li, Xiangtai
    Yang, Yibo
    Cheng, Guangliang
    Zhang, Jing
    Tong, Yunhai
    Zhang, Lefei
    Tao, Dacheng
    COMPUTER VISION - ECCV 2022, PT XXVII, 2022, 13687 : 582 - 599
  • [28] Towards Long-Form Video Understanding
    Wu, Chao-Yuan
    Krahenbuhl, Philipp
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 1884 - 1894
  • [29] Understanding trust in privacy-aware video surveillance systems
    Hatem A. Rashwan
    Agusti Solanas
    Domènec Puig
    Antoni Martínez-Ballesté
    International Journal of Information Security, 2016, 15 : 225 - 234
  • [30] Understanding trust in privacy-aware video surveillance systems
    Rashwan, Hatem A.
    Solanas, Agusti
    Puig, Domenec
    Martinez-Balleste, Antoni
    INTERNATIONAL JOURNAL OF INFORMATION SECURITY, 2016, 15 (03) : 225 - 234