Residual-Network-Based Supervised Gaze Prediction for First-Person Videos

被引:1
|
作者
Li, Yujie [1 ]
Ding, Shuxue [2 ]
Li, Xiang [3 ]
Tan, Benying [1 ,3 ]
Kanemura, Atsunori [1 ,4 ,5 ]
机构
[1] Natl Inst Adv Ind Sci & Technol, Tsukuba, Ibaraki 3058560, Japan
[2] Guilin Univ Elect Technol, Sch Artificial Intelligence, Guilin 541004, Peoples R China
[3] Univ Aizu, Sch Comp Sci & Engn, Aizu Wakamatsu, Fukushima 9650005, Japan
[4] LeapMind Inc, Tokyo 1500044, Japan
[5] Adv Telecommun Res Inst Int, Kyoto 6190288, Japan
基金
日本学术振兴会;
关键词
Gaze prediction; first-person vision (FPV); saliency detection; convolution neural network (CNN); residual network; SALIENCY;
D O I
10.1109/ACCESS.2019.2913791
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Gaze prediction is a significant problem in efficiently processing and understanding a large number of incoming visual signals from first-person views (i.e., egocentric vision). Because many visual processes are expensive and human beings do not process the whole visual field, thus knowing the gaze position is an efficient way to understand the salient content of a video and what users pay attention to. However, current methods for gaze prediction are bottom-up methods and cannot incorporate information about user actions. We proposed a supervised gaze prediction framework based on a residual network, which takes the gaze of user action into consideration. Our model uses the features extracted from the VGG-16 deep neural network to predict the gaze position in FPV videos. The deep residual networks are introduced to combine with this model for learning the residual maps. Our proposed method attempts to obtain gaze prediction results with high accuracy. According to the experimental results, the performance of our proposed gaze prediction method is competitive with that of the state-of-the-art approaches.
引用
收藏
页码:56208 / 56216
页数:9
相关论文
共 50 条
  • [1] Supervised saliency maps for first-person videos based on sparse coding
    Li, Yujie
    Kanemura, Atsunori
    Asoh, Hideki
    Miyanishi, Taiki
    Kawanabe, Motoaki
    2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 2000 - 2005
  • [2] Gaze prediction for first-person videos based on inverse non-negative sparse coding with determinant sparse measure
    Li, Yujie
    Tan, Benying
    Akaho, Shotaro
    Asoh, Hideki
    Ding, Shuxue
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2021, 81
  • [3] Future Person Localization in First-Person Videos
    Yagi, Takuma
    Mangalam, Karttikeya
    Yonetani, Ryo
    Sato, Yoichi
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7593 - 7602
  • [4] Supervised Saliency Mapping for First-Person Videos With an Inverse Sparse Coding Framework
    Li, Yujie
    Akaho, Shotaro
    Asoh, Hideki
    Tan, Benying
    IEEE ACCESS, 2019, 7 : 12547 - 12556
  • [5] CHARACTERIZING DISTORTIONS IN FIRST-PERSON VIDEOS
    Bai, Chen
    Reibman, Amy R.
    2016 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2016, : 2440 - 2444
  • [6] An l1/2-Norm Regularizer-Based Sparse Coding Framework for Gaze Prediction in First-Person Videos
    Li, Yujie
    Li, Zhenni
    Kanemura, Atsunori
    IEEE ACCESS, 2019, 7 : 42472 - 42481
  • [7] First-person Hyper-lapse Videos
    Kopf, Johannes
    Cohen, Michael F.
    Szeliski, Richard
    ACM TRANSACTIONS ON GRAPHICS, 2014, 33 (04):
  • [8] Pooled Motion Features for First-Person Videos
    Ryoo, M. S.
    Rothrock, Brandon
    Matthies, Larry
    2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2015, : 896 - 904
  • [9] Viewing Experience Model of First-Person Videos
    Ma, Biao
    Reibman, Amy R.
    JOURNAL OF IMAGING, 2018, 4 (09)
  • [10] Personal Object Discovery in First-Person Videos
    Lu, Cewu
    Liao, Renjie
    Jia, Jiaya
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015, 24 (12) : 5789 - 5799