AMEGO: Active Memory from Long EGOcentric Videos

被引:0
|
作者
Goletto, Gabriele [1 ]
Nagarajan, Tushar [2 ]
Averta, Giuseppe [1 ]
Damen, Dima [3 ]
机构
[1] Politecn Torino, Turin, Italy
[2] Meta, FAIR, Austin, TX USA
[3] Univ Bristol, Bristol, Avon, England
来源
基金
英国工程与自然科学研究理事会;
关键词
Long video understanding; Egocentric vision;
D O I
10.1007/978-3-031-72624-8_6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Egocentric videos provide a unique perspective into individuals' daily experiences, yet their unstructured nature presents challenges for perception. In this paper, we introduce AMEGO, a novel approach aimed at enhancing the comprehension of very-long egocentric videos. Inspired by the human's ability to maintain information from a single watching, AMEGOfocuses on constructing a self-contained representations from one egocentric video, capturing key locations and object interactions. This representation is semantic-free and facilitates multiple queries without the need to reprocess the entire visual content. Additionally, to evaluate our understanding of very-long egocentric videos, we introduce the new Active Memories Benchmark (AMB), composed of more than 20K of highly challenging visual queries from EPIC-KITCHENS. These queries cover different levels of video reasoning (sequencing, concurrency and temporal grounding) to assess detailed video understanding capabilities. We show-case improved performance of AMEGO on AMB, surpassing other video QA baselines by a substantial margin.
引用
收藏
页码:92 / 110
页数:19
相关论文
共 50 条
  • [1] Next-active-object prediction from egocentric videos
    Furnari, Antonino
    Battiato, Sebastiano
    Grauman, Kristen
    Farinella, Giovanni Maria
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2017, 49 : 401 - 411
  • [2] Anticipating Next Active Objects for Egocentric Videos
    Thakur, Sanket Kumar
    Beyan, Cigdem
    Morerio, Pietro
    Murino, Vittorio
    del Bue, Alessio
    IEEE ACCESS, 2024, 12 : 61767 - 61779
  • [3] Generating Personalized Summaries of Day Long Egocentric Videos
    Nagar, Pravin
    Rathore, Anuj
    Jawahar, C. V.
    Arora, Chetan
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (06) : 6832 - 6845
  • [4] Grounded Question-Answering in Long Egocentric Videos
    Di, Shangzhe
    Xie, Weidi
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 12934 - 12943
  • [5] Anonymizing Egocentric Videos
    Thapar, Daksh
    Nigam, Aditya
    Arora, Chetan
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 2300 - 2309
  • [6] Head Motion Signatures from Egocentric Videos
    Poleg, Yair
    Arora, Chetan
    Peleg, Shmuel
    COMPUTER VISION - ACCV 2014, PT III, 2015, 9005 : 315 - 329
  • [7] Generic Action Recognition from Egocentric Videos
    Singh, Suriya
    Arora, Chetan
    Jawahar, C. V.
    2015 FIFTH NATIONAL CONFERENCE ON COMPUTER VISION, PATTERN RECOGNITION, IMAGE PROCESSING AND GRAPHICS (NCVPRIPG), 2015,
  • [8] Learning Navigation Subroutines from Egocentric Videos
    Kumar, Ashish
    Gupta, Saurabh
    Malik, Jitendra
    CONFERENCE ON ROBOT LEARNING, VOL 100, 2019, 100
  • [9] Market basket analysis from egocentric videos
    Santarcangelo, Vito
    Farinella, Giovanni Maria
    Furnari, Antonino
    Battiato, Sebastiano
    PATTERN RECOGNITION LETTERS, 2018, 112 : 83 - 90
  • [10] Recognizing Personal Locations From Egocentric Videos
    Furnari, Antonino
    Farinella, Giovanni Maria
    Battiato, Sebastiano
    IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, 2017, 47 (01) : 6 - 18