Video Event Understanding using Natural Language Descriptions

被引:20
|
作者
Ramanathan, Vignesh [1 ]
Liang, Percy [2 ]
Li Fei-Fei [2 ]
机构
[1] Stanford Univ, Dept Elect Engn, Stanford, CA 94305 USA
[2] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
关键词
D O I
10.1109/ICCV.2013.117
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Human action and role recognition play an important part in complex event understanding. State-of-the-art methods learn action and role models from detailed spatio temporal annotations, which requires extensive human effort. In this work, we propose a method to learn such models based on natural language descriptions of the training videos, which are easier to collect and scale with the number of actions and roles. There are two challenges with using this form of weak supervision: First, these descriptions only provide a high-level summary and often do not directly mention the actions and roles occurring in a video. Second, natural language descriptions do not provide spatio temporal annotations of actions and roles. To tackle these challenges, we introduce a topic-based semantic relatedness (SR) measure between a video description and an action and role label, and incorporate it into a posterior regularization objective. Our event recognition system based on these action and role models matches the state-of-the-art method on the TRECVID-MED11 event kit, despite weaker supervision.
引用
收藏
页码:905 / 912
页数:8
相关论文
共 50 条
  • [41] Generating Natural Language Descriptions From Tables
    Cao, Juan
    IEEE ACCESS, 2020, 8 (08): : 46206 - 46216
  • [42] Semantic Novelty Detection in Natural Language Descriptions
    Ma, Nianzu
    Politowicz, Alexander
    Mazumder, Sahisnu
    Chen, Jiahua
    Liu, Bing
    Robertson, Eric
    Grigsby, Scott
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 866 - 882
  • [43] Visual representation of natural language scene descriptions
    Giunchiglia, E
    Armando, A
    Traverso, P
    Cimatti, A
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 1996, 26 (04): : 575 - 589
  • [44] Prototypes and idealizations in natural language shape descriptions
    Heidorn, PB
    ASIS '98 - PROCEEDINGS OF THE 61ST ASIS ANNUAL MEETING, VOL 35, 1998: INFORMATION ACCESS IN THE GLOBAL INFORMATION ECONOMY, 1998, 35 : 549 - 558
  • [45] Prototypes and idealizations in natural language shape descriptions
    Heidorn, PB
    PROCEEDINGS OF THE ASIS ANNUAL MEETING, 1998, 35 : 549 - 558
  • [46] Using Natural Language Processing Techniques to Improve Manual Test Case Descriptions
    Viggiato, Markos
    Paas, Dale
    Buzon, Chris
    Bezemer, Cor-Paul
    2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: SOFTWARE ENGINEERING IN PRACTICE (ICSE-SEIP 2022), 2022, : 311 - 320
  • [47] Evaluating Chess-Like Games Using Generated Natural Language Descriptions
    Kowalski, Jakub
    Zarczynski, Lukasz
    Kisielewicz, Andrzej
    ADVANCES IN COMPUTER GAMES, ACG 2017, 2017, 10664 : 127 - 139
  • [48] An Operation-oriented Document Natural Language Understanding Method Based on Event Model
    Xie, Baoling
    Liu, Kan
    PROCEEDINGS OF 2014 IEEE INTERNATIONAL CONFERENCE ON PROGRESS IN INFORMATICS AND COMPUTING (PIC), 2014, : 16 - 20
  • [49] Generating Natural Video Descriptions via Multimodal Processing
    Jin, Qin
    Liang, Junwei
    Lin, Xiaozhu
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 570 - 574
  • [50] Deep Video Understanding with Video-Language Model
    Liu, Runze
    Fang, Yaqun
    Yu, Fan
    Tian, Ruiqi
    Ren, Tongwei
    Wu, Gangshan
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 9551 - 9555