Query-aware Long Video Localization and Relation Discrimination for Deep Video Understanding

被引：0

作者：

Xu, Yuanxing ^{[1
]}

Wei, Yuting ^{[1
]}

Wu, Bin ^{[1
]}

机构：

[1] Beijing Univ Posts & Telecommun, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023 | 2023年

基金：

中国国家自然科学基金;

关键词：

Deep video understanding; Multimodal analysis; Relation discrimination; Question answering;

D O I：

10.1145/3581783.3612871

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The surge in video and social media content underscores the need for a deeper understanding of multimedia data. Most of the existing mature video understanding techniques perform well with short formats and content that requires only shallow understanding, but do not perform well with long format videos that require deep understanding and reasoning. Deep Video Understanding (DVU) Challenge aims to push the boundaries of multimodal extraction, fusion, and analytics to address the problem of holistically analyzing long videos and extract useful knowledge to solve different types of queries. This paper introduces a query-aware method for long video localization and relation discrimination, leveraging an image-language pretrained model. This model adeptly selects frames pertinent to queries, obviating the need for a complete movie-level knowledge graph. Our approach achieved first and fourth positions for two groups of movie-level queries. Sufficient experiments and final rankings demonstrate its effectiveness and robustness.

引用

页码：9591 / 9595

页数：5

共 50 条

[21] Second International Workshop on Deep Video Understanding
Curtis, Keith
Awad, George
Rajput, Shahzad
Soborof, Ian
PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2022, 2022, : 801 - 802
[22] PyTorchVideo: A Deep Learning Library for Video Understanding
Fan, Haoqi
Murrell, Tullie
Wang, Heng
Alwala, Kalyan Vasudev
Li, Yanghao
Li, Yilei
Xiong, Bo
Ravi, Nikhila
Li, Meng
Yang, Haichuan
Malik, Jitendra
Girshick, Ross
Feiszli, Matt
Adcock, Aaron
Lo, Wan-Yen
Feichtenhofer, Christoph
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 3783 - 3786
[23] Gaze Aware Deep Learning Model for Video Summarization
Wu, Jiaxin
Zhong, Sheng-hua
Ma, Zheng
Heinen, Stephen J.
Jiang, Jianmin
ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2018, PT II, 2018, 11165 : 285 - 295
[24] Motion-Aware Deep Video Coding Network
Khan, Rida
Liu, Ying
BIG DATA II: LEARNING, ANALYTICS, AND APPLICATIONS, 2020, 11395
[25] Uncertainty-Aware Deep Video Compression With Ensembles
Ma, Wufei
Li, Jiahao
Li, Bin
Lu, Yan
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 7863 - 7872
[26] Budget-Aware Deep Semantic Video Segmentation
Mahasseni, Behrooz
Todorovic, Sinisa
Fern, Alan
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 2077 - 2086
[27] PolyphonicFormer: Unified Query Learning for Depth-Aware Video Panoptic Segmentation
Yuan, Haobo
Li, Xiangtai
Yang, Yibo
Cheng, Guangliang
Zhang, Jing
Tong, Yunhai
Zhang, Lefei
Tao, Dacheng
COMPUTER VISION - ECCV 2022, PT XXVII, 2022, 13687 : 582 - 599
[28] Towards Long-Form Video Understanding
Wu, Chao-Yuan
Krahenbuhl, Philipp
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 1884 - 1894
[29] Understanding trust in privacy-aware video surveillance systems
Hatem A. Rashwan
Agusti Solanas
Domènec Puig
Antoni Martínez-Ballesté
International Journal of Information Security, 2016, 15 : 225 - 234
[30] Understanding trust in privacy-aware video surveillance systems
Rashwan, Hatem A.
Solanas, Agusti
Puig, Domenec
Martinez-Balleste, Antoni
INTERNATIONAL JOURNAL OF INFORMATION SECURITY, 2016, 15 (03) : 225 - 234

← 1 2 3 4 5 →