Video Event Understanding using Natural Language Descriptions

被引:20
|
作者
Ramanathan, Vignesh [1 ]
Liang, Percy [2 ]
Li Fei-Fei [2 ]
机构
[1] Stanford Univ, Dept Elect Engn, Stanford, CA 94305 USA
[2] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
关键词
D O I
10.1109/ICCV.2013.117
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Human action and role recognition play an important part in complex event understanding. State-of-the-art methods learn action and role models from detailed spatio temporal annotations, which requires extensive human effort. In this work, we propose a method to learn such models based on natural language descriptions of the training videos, which are easier to collect and scale with the number of actions and roles. There are two challenges with using this form of weak supervision: First, these descriptions only provide a high-level summary and often do not directly mention the actions and roles occurring in a video. Second, natural language descriptions do not provide spatio temporal annotations of actions and roles. To tackle these challenges, we introduce a topic-based semantic relatedness (SR) measure between a video description and an action and role label, and incorporate it into a posterior regularization objective. Our event recognition system based on these action and role models matches the state-of-the-art method on the TRECVID-MED11 event kit, despite weaker supervision.
引用
收藏
页码:905 / 912
页数:8
相关论文
共 50 条
  • [21] Searching a Video Database using Natural Language Queries
    Shubha, M.
    Kapoor, Kritika
    Shrutiya, M.
    Mamatha, H. R.
    2021 INTERNATIONAL CONFERENCE ON EMERGING SMART COMPUTING AND INFORMATICS (ESCI), 2021, : 190 - 196
  • [22] DATA COLLECTION AND LANGUAGE UNDERSTANDING OF FOOD DESCRIPTIONS
    Korpusik, Mandy
    Schmidt, Nicole
    Drexler, Jennifer
    Cyphers, Scott
    Glass, James
    2014 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY SLT 2014, 2014, : 560 - 565
  • [23] A Survey on Event Extraction for Natural Language Understanding: Riding the Biomedical Literature Wave
    Frisoni, Giacomo
    Moro, Gianluca
    Carbonaro, Antonella
    IEEE ACCESS, 2021, 9 : 160721 - 160757
  • [24] Comprehensive Event Representations using Event Knowledge Graphs and Natural Language Processing
    Kuculo, Tin
    COMPANION PROCEEDINGS OF THE WEB CONFERENCE 2022, WWW 2022 COMPANION, 2022, : 359 - 363
  • [25] Character-based Embedding Models and Reranking Strategies for Understanding Natural Language Meal Descriptions
    Korpusik, Mandy
    Collins, Zachary
    Glass, James
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3320 - 3324
  • [26] Natural Language Understanding
    Di Sciullo, Anna Maria
    NEW TRENDS IN SOFTWARE METHODOLOGIES, TOOLS AND TECHNIQUES, 2009, 199 : 551 - 563
  • [27] UNDERSTANDING NATURAL LANGUAGE
    WINOGRAD, T
    COGNITIVE PSYCHOLOGY, 1972, 3 (01) : 1 - 191
  • [28] Siamese Natural Language Tracker: Tracking by Natural Language Descriptions with Siamese Trackers
    Feng, Qi
    Ablavsky, Vitaly
    Bai, Qinxun
    Sclaroff, Stan
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 5847 - 5856
  • [29] Revisiting the "Video" in Video-Language Understanding
    Buch, Shyamal
    Eyzaguirre, Cristobal
    Gaidon, Adrien
    Wu, Jiajun
    Li Fei-Fei
    Niebles, Juan Carlos
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 2907 - 2917
  • [30] Using Natural Sentences for Understanding Biases in Language Models
    Alnegheimish, Sarah
    Guo, Alicia
    Sun, Yi
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 2824 - 2830