Video Event Understanding using Natural Language Descriptions

被引：20

作者：

Ramanathan, Vignesh ^{[1
]}

Liang, Percy ^{[2
]}

Li Fei-Fei ^{[2
]}

机构：

[1] Stanford Univ, Dept Elect Engn, Stanford, CA 94305 USA

[2] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA

来源：

2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV) | 2013年

关键词：

D O I：

10.1109/ICCV.2013.117

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Human action and role recognition play an important part in complex event understanding. State-of-the-art methods learn action and role models from detailed spatio temporal annotations, which requires extensive human effort. In this work, we propose a method to learn such models based on natural language descriptions of the training videos, which are easier to collect and scale with the number of actions and roles. There are two challenges with using this form of weak supervision: First, these descriptions only provide a high-level summary and often do not directly mention the actions and roles occurring in a video. Second, natural language descriptions do not provide spatio temporal annotations of actions and roles. To tackle these challenges, we introduce a topic-based semantic relatedness (SR) measure between a video description and an action and role label, and incorporate it into a posterior regularization objective. Our event recognition system based on these action and role models matches the state-of-the-art method on the TRECVID-MED11 event kit, despite weaker supervision.

引用

页码：905 / 912

页数：8

共 50 条

[41] Generating Natural Language Descriptions From Tables
Cao, Juan
IEEE ACCESS, 2020, 8 (08): : 46206 - 46216
[42] Semantic Novelty Detection in Natural Language Descriptions
Ma, Nianzu
Politowicz, Alexander
Mazumder, Sahisnu
Chen, Jiahua
Liu, Bing
Robertson, Eric
Grigsby, Scott
2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 866 - 882
[43] Visual representation of natural language scene descriptions
Giunchiglia, E
Armando, A
Traverso, P
Cimatti, A
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 1996, 26 (04): : 575 - 589
[44] Prototypes and idealizations in natural language shape descriptions
Heidorn, PB
ASIS '98 - PROCEEDINGS OF THE 61ST ASIS ANNUAL MEETING, VOL 35, 1998: INFORMATION ACCESS IN THE GLOBAL INFORMATION ECONOMY, 1998, 35 : 549 - 558
[45] Prototypes and idealizations in natural language shape descriptions
Heidorn, PB
PROCEEDINGS OF THE ASIS ANNUAL MEETING, 1998, 35 : 549 - 558
[46] Using Natural Language Processing Techniques to Improve Manual Test Case Descriptions
Viggiato, Markos
Paas, Dale
Buzon, Chris
Bezemer, Cor-Paul
2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: SOFTWARE ENGINEERING IN PRACTICE (ICSE-SEIP 2022), 2022, : 311 - 320
[47] Evaluating Chess-Like Games Using Generated Natural Language Descriptions
Kowalski, Jakub
Zarczynski, Lukasz
Kisielewicz, Andrzej
ADVANCES IN COMPUTER GAMES, ACG 2017, 2017, 10664 : 127 - 139
[48] An Operation-oriented Document Natural Language Understanding Method Based on Event Model
Xie, Baoling
Liu, Kan
PROCEEDINGS OF 2014 IEEE INTERNATIONAL CONFERENCE ON PROGRESS IN INFORMATICS AND COMPUTING (PIC), 2014, : 16 - 20
[49] Generating Natural Video Descriptions via Multimodal Processing
Jin, Qin
Liang, Junwei
Lin, Xiaozhu
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 570 - 574
[50] Deep Video Understanding with Video-Language Model
Liu, Runze
Fang, Yaqun
Yu, Fan
Tian, Ruiqi
Ren, Tongwei
Wu, Gangshan
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 9551 - 9555

← 1 2 3 4 5 →