Residual-Network-Based Supervised Gaze Prediction for First-Person Videos

被引:1
|
作者
Li, Yujie [1 ]
Ding, Shuxue [2 ]
Li, Xiang [3 ]
Tan, Benying [1 ,3 ]
Kanemura, Atsunori [1 ,4 ,5 ]
机构
[1] Natl Inst Adv Ind Sci & Technol, Tsukuba, Ibaraki 3058560, Japan
[2] Guilin Univ Elect Technol, Sch Artificial Intelligence, Guilin 541004, Peoples R China
[3] Univ Aizu, Sch Comp Sci & Engn, Aizu Wakamatsu, Fukushima 9650005, Japan
[4] LeapMind Inc, Tokyo 1500044, Japan
[5] Adv Telecommun Res Inst Int, Kyoto 6190288, Japan
基金
日本学术振兴会;
关键词
Gaze prediction; first-person vision (FPV); saliency detection; convolution neural network (CNN); residual network; SALIENCY;
D O I
10.1109/ACCESS.2019.2913791
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Gaze prediction is a significant problem in efficiently processing and understanding a large number of incoming visual signals from first-person views (i.e., egocentric vision). Because many visual processes are expensive and human beings do not process the whole visual field, thus knowing the gaze position is an efficient way to understand the salient content of a video and what users pay attention to. However, current methods for gaze prediction are bottom-up methods and cannot incorporate information about user actions. We proposed a supervised gaze prediction framework based on a residual network, which takes the gaze of user action into consideration. Our model uses the features extracted from the VGG-16 deep neural network to predict the gaze position in FPV videos. The deep residual networks are introduced to combine with this model for learning the residual maps. Our proposed method attempts to obtain gaze prediction results with high accuracy. According to the experimental results, the performance of our proposed gaze prediction method is competitive with that of the state-of-the-art approaches.
引用
收藏
页码:56208 / 56216
页数:9
相关论文
共 50 条
  • [21] Measuring and Improving the Viewing Experience of First-person Videos
    Ma, Biao
    Reibman, Amy R.
    PROCEEDINGS OF THE THEMATIC WORKSHOPS OF ACM MULTIMEDIA 2017 (THEMATIC WORKSHOPS'17), 2017, : 493 - 501
  • [22] Unsupervised Traffic Accident Detection in First-Person Videos
    Yao, Yu
    Xu, Mingze
    Wang, Yuchen
    Crandall, David J.
    Atkins, Ella M.
    2019 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2019, : 273 - 280
  • [23] Identifying First-person Camera Wearers in Third-person Videos
    Fan, Chenyou
    Lee, Jangwon
    Xu, Mingze
    Singh, Krishna Kumar
    Lee, Yong Jae
    Crandall, David J.
    Ryoo, Michael S.
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4734 - 4742
  • [24] Automatic Gaze Analysis in Multiparty Conversations based on Collective First-Person Vision
    Kumano, Shiro
    Otsuka, Kazuhiro
    Ishii, Ryo
    Yamato, Junji
    2015 11TH IEEE INTERNATIONAL CONFERENCE AND WORKSHOPS ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG): EMOTION REPRESENTATION, ANALYSIS AND SYNTHESIS IN CONTINUOUS TIME AND SPACE (EMOSPACE 2015), VOL 5, 2015,
  • [25] Robot-Centric Activity Prediction from First-Person Videos: What Will They Do to Me?
    Ryoo, M. S.
    Fuchs, Thomas J.
    Xia, Lu
    Aggarwa, J. K.
    Matthies, Larry
    PROCEEDINGS OF THE 2015 ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION (HRI'15), 2015, : 295 - 302
  • [26] MAKING THIRD PERSON TECHNIQUES RECOGNIZE FIRST-PERSON ACTIONS IN EGOCENTRIC VIDEOS
    Verma, Sagar
    Nagar, Pravin
    Gupta, Divam
    Arora, Chetan
    2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 2301 - 2305
  • [27] A Graph-Theoretic Framework for Summarizing First-Person Videos
    Sahu, Abhimanyu
    Chowdhury, Ananda S.
    GRAPH-BASED REPRESENTATIONS IN PATTERN RECOGNITION, GBRPR 2019, 2019, 11510 : 183 - 193
  • [28] Unsupervised Learning of Important Objects from First-Person Videos
    Bertasius, Gedas
    Park, Hyun Soo
    Yu, Stella X.
    Shi, Jianbo
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 1974 - 1982
  • [29] Musical Hyperlapse: A Multimodal Approach to Accelerate First-Person Videos
    de Matos, Diognei
    Ramos, Washington
    Romanhol, Luiz
    Nascimento, Erickson R.
    2021 34TH SIBGRAPI CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES (SIBGRAPI 2021), 2021, : 184 - 191
  • [30] Ego-Action Analysis for First-Person Sports Videos
    Kitani, Kris
    IEEE PERVASIVE COMPUTING, 2012, 11 (02) : 92 - 95