TVQA: Localized, Compositional Video Question Answering

被引：0

作者：

Lei, Jie ^{[1
]}

Yu, Licheng ^{[1
]}

Bansal, Mohit ^{[1
]}

Berg, Tamara L. ^{[1
]}

机构：

[1] Univ N Carolina, Dept Comp Sci, Chapel Hill, NC 27515 USA

来源：

2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018) | 2018年

基金：

美国国家科学基金会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recent years have witnessed an increasing interest in image-based question-answering (QA) tasks. However, due to data limitations, there has been much less work on video-based QA. In this paper, we present TVQA, a largescale video QA dataset based on 6 popular TV shows. TVQA consists of 152,545 QA pairs from 21,793 clips, spanning over 460 hours of video. Questions are designed to be compositional in nature, requiring systems to jointly localize relevant moments within a clip, comprehend subtitle-based dialogue, and recognize relevant visual concepts. We provide analyses of this new dataset as well as several baselines and a multi-stream end-to-end trainable neural network framework for the TVQA task. The dataset is publicly available at http://tvqa.cs.unc.edu.

引用

页码：1369 / 1379

页数：11

共 50 条

[31] Contrastive Video Question Answering via Video Graph Transformer
Xiao, Junbin
Zhou, Pan
Yao, Angela
Li, Yicong
Hong, Richang
Yan, Shuicheng
Chua, Tat-Seng
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (11) : 13265 - 13280
[32] Uncovering the Temporal Context for Video Question Answering
Zhu, Linchao
Xu, Zhongwen
Yang, Yi
Hauptmann, Alexander G.
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2017, 124 (03) : 409 - 421
[33] Video Question Answering With Semantic Disentanglement and Reasoning
Liu, Jin
Wang, Guoxiang
Xie, Jialong
Zhou, Fengyu
Xu, Huijuan
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (05) : 3663 - 3673
[34] Embedding VLAD in Transformer for Video Question Answering
Guo D.
Yao S.-T.
Wang H.
Wang M.
Jisuanji Xuebao/Chinese Journal of Computers, 2023, 46 (04): : 671 - 689
[35] Question answering on large news video archive
Chua, TS
ISPA 2003: PROCEEDINGS OF THE 3RD INTERNATIONAL SYMPOSIUM ON IMAGE AND SIGNAL PROCESSING AND ANALYSIS, PTS 1 AND 2, 2003, : 289 - 294
[36] On the hidden treasure of dialog in video question answering
Engin, Deniz
Schnitzler, Francois
Duong, Ngoc Q. K.
Avrithis, Yannis
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 2044 - 2053
[37] Video Question Answering: a Survey of Models and Datasets
Guanglu Sun
Lili Liang
Tianlin Li
Bo Yu
Meng Wu
Bolun Zhang
Mobile Networks and Applications, 2021, 26 : 1904 - 1937
[38] Video Question Answering: a Survey of Models and Datasets
Sun, Guanglu
Liang, Lili
Li, Tianlin
Yu, Bo
Wu, Meng
Zhang, Bolun
MOBILE NETWORKS & APPLICATIONS, 2021, 26 (05): : 1904 - 1937
[39] Complementary spatiotemporal network for video question answering
Xinrui Li
Aming Wu
Yahong Han
Multimedia Systems, 2022, 28 : 161 - 169
[40] Complementary spatiotemporal network for video question answering
Li, Xinrui
Wu, Aming
Han, Yahong
MULTIMEDIA SYSTEMS, 2022, 28 (01) : 161 - 169

← 1 2 3 4 5 →