Video Event Understanding using Natural Language Descriptions

被引：20

作者：

Ramanathan, Vignesh ^{[1
]}

Liang, Percy ^{[2
]}

Li Fei-Fei ^{[2
]}

机构：

[1] Stanford Univ, Dept Elect Engn, Stanford, CA 94305 USA

[2] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA

来源：

2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV) | 2013年

关键词：

D O I：

10.1109/ICCV.2013.117

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Human action and role recognition play an important part in complex event understanding. State-of-the-art methods learn action and role models from detailed spatio temporal annotations, which requires extensive human effort. In this work, we propose a method to learn such models based on natural language descriptions of the training videos, which are easier to collect and scale with the number of actions and roles. There are two challenges with using this form of weak supervision: First, these descriptions only provide a high-level summary and often do not directly mention the actions and roles occurring in a video. Second, natural language descriptions do not provide spatio temporal annotations of actions and roles. To tackle these challenges, we introduce a topic-based semantic relatedness (SR) measure between a video description and an action and role label, and incorporate it into a posterior regularization objective. Our event recognition system based on these action and role models matches the state-of-the-art method on the TRECVID-MED11 event kit, despite weaker supervision.

引用

页码：905 / 912

页数：8

共 50 条

[1] Translating Video Content to Natural Language Descriptions
Rohrbach, Marcus
Qiu, Wei
Titov, Ivan
Thater, Stefan
Pinkal, Manfred
Schiele, Bernt
2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, : 433 - 440
[2] A framework for creating natural language descriptions of video streams
Khan, Muhammad Usman Ghani
Al Harbi, Nouf
Gotoh, Yoshihiko
INFORMATION SCIENCES, 2015, 303 : 61 - 82
[3] THE LANGUAGE OF EVENT DESCRIPTIONS
FRENCH, L
NELSON, K
BULLETIN OF THE BRITISH PSYCHOLOGICAL SOCIETY, 1984, 37 (FEB): : A29 - A30
[4] Natural language descriptions of human Behavior from video sequences
Tena, Carles Fernandez
Baiget, Pau
Roca, Xavier
Gonzalez, Jordi
KI 2007: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2007, 4667 : 279 - +
[5] Conceptual representations between video signals and natural language descriptions
Arens, M.
Gerber, R.
Nagel, H. -H.
IMAGE AND VISION COMPUTING, 2008, 26 (01) : 53 - 66
[6] Generating Natural Video Descriptions using Semantic Gate
Lee, Hyungmin
Kim, Il-Koo
2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
[7] CAPP USING NATURAL-LANGUAGE PART DESCRIPTIONS
MASON, AK
OKHUYSEN, GA
JOURNAL OF SYSTEMS ENGINEERING, 1995, 5 (01): : 27 - 35
[8] Matchmaking Using Natural Language Descriptions Linking Customers with Enterprise Service Descriptions
Geldart, Joe
Song, William
Li, Yang
2009 IEEE 33RD INTERNATIONAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE, VOLS 1 AND 2, 2009, : 1049 - +
[9] Conversational natural language understanding interfacing city event information
Mast, M
Ross, T
Schulz, H
Harrikari, H
Demesticha, V
Polymenakos, L
Vamvakoulas, Y
Stadermann, J
DATA & KNOWLEDGE ENGINEERING, 2002, 42 (03) : 343 - 360
[10] USING NATURAL LANGUAGE DESCRIPTIONS TO IMPROVE THE USABILITY OF DATABASES.
HAFNER, CAROLE D.
JOYCE, JOHN D.
1600,

← 1 2 3 4 5 →