Video Event Understanding using Natural Language Descriptions

被引：20

作者：

Ramanathan, Vignesh ^{[1
]}

Liang, Percy ^{[2
]}

Li Fei-Fei ^{[2
]}

机构：

[1] Stanford Univ, Dept Elect Engn, Stanford, CA 94305 USA

[2] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA

来源：

2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV) | 2013年

关键词：

D O I：

10.1109/ICCV.2013.117

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Human action and role recognition play an important part in complex event understanding. State-of-the-art methods learn action and role models from detailed spatio temporal annotations, which requires extensive human effort. In this work, we propose a method to learn such models based on natural language descriptions of the training videos, which are easier to collect and scale with the number of actions and roles. There are two challenges with using this form of weak supervision: First, these descriptions only provide a high-level summary and often do not directly mention the actions and roles occurring in a video. Second, natural language descriptions do not provide spatio temporal annotations of actions and roles. To tackle these challenges, we introduce a topic-based semantic relatedness (SR) measure between a video description and an action and role label, and incorporate it into a posterior regularization objective. Our event recognition system based on these action and role models matches the state-of-the-art method on the TRECVID-MED11 event kit, despite weaker supervision.

引用

页码：905 / 912

页数：8

共 50 条

[21] Searching a Video Database using Natural Language Queries
Shubha, M.
Kapoor, Kritika
Shrutiya, M.
Mamatha, H. R.
2021 INTERNATIONAL CONFERENCE ON EMERGING SMART COMPUTING AND INFORMATICS (ESCI), 2021, : 190 - 196
[22] DATA COLLECTION AND LANGUAGE UNDERSTANDING OF FOOD DESCRIPTIONS
Korpusik, Mandy
Schmidt, Nicole
Drexler, Jennifer
Cyphers, Scott
Glass, James
2014 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY SLT 2014, 2014, : 560 - 565
[23] A Survey on Event Extraction for Natural Language Understanding: Riding the Biomedical Literature Wave
Frisoni, Giacomo
Moro, Gianluca
Carbonaro, Antonella
IEEE ACCESS, 2021, 9 : 160721 - 160757
[24] Comprehensive Event Representations using Event Knowledge Graphs and Natural Language Processing
Kuculo, Tin
COMPANION PROCEEDINGS OF THE WEB CONFERENCE 2022, WWW 2022 COMPANION, 2022, : 359 - 363
[25] Character-based Embedding Models and Reranking Strategies for Understanding Natural Language Meal Descriptions
Korpusik, Mandy
Collins, Zachary
Glass, James
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3320 - 3324
[26] Natural Language Understanding
Di Sciullo, Anna Maria
NEW TRENDS IN SOFTWARE METHODOLOGIES, TOOLS AND TECHNIQUES, 2009, 199 : 551 - 563
[27] UNDERSTANDING NATURAL LANGUAGE
WINOGRAD, T
COGNITIVE PSYCHOLOGY, 1972, 3 (01) : 1 - 191
[28] Siamese Natural Language Tracker: Tracking by Natural Language Descriptions with Siamese Trackers
Feng, Qi
Ablavsky, Vitaly
Bai, Qinxun
Sclaroff, Stan
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 5847 - 5856
[29] Revisiting the "Video" in Video-Language Understanding
Buch, Shyamal
Eyzaguirre, Cristobal
Gaidon, Adrien
Wu, Jiajun
Li Fei-Fei
Niebles, Juan Carlos
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 2907 - 2917
[30] Using Natural Sentences for Understanding Biases in Language Models
Alnegheimish, Sarah
Guo, Alicia
Sun, Yi
NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 2824 - 2830

← 1 2 3 4 5 →