Egocentric action anticipation from untrimmed videos

被引:0
|
作者
Rodin, Ivan [1 ]
Furnari, Antonino [1 ,2 ]
Farinella, Giovanni Maria [1 ,2 ]
机构
[1] Univ Catania, Catania, Italy
[2] Univ Catania, Next Vis srl Spinoff, Catania, Italy
关键词
computer vision; pattern recognition;
D O I
10.1049/cvi2.12342
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Egocentric action anticipation involves predicting future actions performed by the camera wearer from egocentric video. Although the task has recently gained attention in the research community, current approaches often assume that input videos are 'trimmed', meaning that a short video sequence is sampled a fixed time before the beginning of the action. However, trimmed action anticipation has limited applicability in real-world scenarios, where it is crucial to deal with 'untrimmed' video inputs and the exact moment of action initiation cannot be assumed at test time. To address these limitations, an untrimmed action anticipation task is proposed, which, akin to temporal action detection, assumes that the input video is untrimmed at test time, while still requiring predictions to be made before actions take place. The authors introduce a benchmark evaluation procedure for methods designed to address this novel task and compare several baselines on the EPIC-KITCHENS-100 dataset. Through our experimental evaluation, testing a variety of models, the authors aim to better understand their performance in untrimmed action anticipation. Our results reveal that the performance of current models designed for trimmed action anticipation is limited, emphasising the need for further research in this area.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] Streaming egocentric action anticipation: An evaluation scheme and approach
    Furnari, Antonino
    Farinella, Giovanni Maria
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 234
  • [22] Egocentric Action Anticipation Based on Unsupervised Gaze Estimation
    ZHONG Cengsi
    FANG Zhijun
    GAO Yongbin
    HUANG Bo
    Wuhan University Journal of Natural Sciences, 2021, 26 (03) : 207 - 214
  • [23] Video Imprint Segmentation for Temporal Action Detection in Untrimmed Videos
    Gao, Zhanning
    Wang, Le
    Zhang, Qilin
    Niu, Zhenxing
    Zheng, Nanning
    Hua, Gang
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8328 - 8335
  • [24] Deep Learning-Based Action Detection in Untrimmed Videos: A Survey
    Vahdani, Elahe
    Tian, Yingli
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (04) : 4302 - 4320
  • [25] Confidence-Guided Self Refinement for Action Prediction in Untrimmed Videos
    Hou, Jingyi
    Wu, Xinxiao
    Wang, Ruiqi
    Luo, Jiebo
    Jia, Yunde
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 6017 - 6031
  • [26] AutoLoc: Weakly-Supervised Temporal Action Localization in Untrimmed Videos
    Shou, Zheng
    Gao, Hang
    Zhang, Lei
    Miyazawa, Kazuyuki
    Chang, Shih-Fu
    COMPUTER VISION - ECCV 2018, PT XVI, 2018, 11220 : 162 - 179
  • [27] Deep Future Gaze: Gaze Anticipation on Egocentric Videos Using Adversarial Networks
    Zhang, Mengmi
    Ma, Keng Teck
    Lim, Joo Hwee
    Zhao, Qi
    Feng, Jiashi
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 3539 - 3548
  • [28] Adapting Short-Term Transformers for Action Detection in Untrimmed Videos
    Yang, Min
    Gao, Huan
    Guo, Ping
    Wang, Limin
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 18570 - 18579
  • [29] Detecting Hands in Egocentric Videos: Towards Action Recognition
    Cartas, Alejandro
    Dimiccoli, Mariella
    Radeva, Petia
    COMPUTER AIDED SYSTEMS THEORY - EUROCAST 2017, PT II, 2018, 10672 : 330 - 338
  • [30] ACTION RECOGNITION IN RGB-D EGOCENTRIC VIDEOS
    Tang, Yansong
    Tian, Yi
    Lu, Jiwen
    Feng, Jianjiang
    Zhou, Jie
    2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, : 3410 - 3414