Residual-Network-Based Supervised Gaze Prediction for First-Person Videos

被引：1

作者：

Li, Yujie ^{[1
]}

Ding, Shuxue ^{[2
]}

Li, Xiang ^{[3
]}

Tan, Benying ^{[1
,3
]}

Kanemura, Atsunori ^{[1
,4
,5
]}

机构：

[1] Natl Inst Adv Ind Sci & Technol, Tsukuba, Ibaraki 3058560, Japan

[2] Guilin Univ Elect Technol, Sch Artificial Intelligence, Guilin 541004, Peoples R China

[3] Univ Aizu, Sch Comp Sci & Engn, Aizu Wakamatsu, Fukushima 9650005, Japan

[4] LeapMind Inc, Tokyo 1500044, Japan

[5] Adv Telecommun Res Inst Int, Kyoto 6190288, Japan

来源：

IEEE ACCESS | 2019年 / 7卷

基金：

日本学术振兴会;

关键词：

Gaze prediction; first-person vision (FPV); saliency detection; convolution neural network (CNN); residual network; SALIENCY;

D O I：

10.1109/ACCESS.2019.2913791

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Gaze prediction is a significant problem in efficiently processing and understanding a large number of incoming visual signals from first-person views (i.e., egocentric vision). Because many visual processes are expensive and human beings do not process the whole visual field, thus knowing the gaze position is an efficient way to understand the salient content of a video and what users pay attention to. However, current methods for gaze prediction are bottom-up methods and cannot incorporate information about user actions. We proposed a supervised gaze prediction framework based on a residual network, which takes the gaze of user action into consideration. Our model uses the features extracted from the VGG-16 deep neural network to predict the gaze position in FPV videos. The deep residual networks are introduced to combine with this model for learning the residual maps. Our proposed method attempts to obtain gaze prediction results with high accuracy. According to the experimental results, the performance of our proposed gaze prediction method is competitive with that of the state-of-the-art approaches.

引用

页码：56208 / 56216

页数：9

共 50 条

[1] Supervised saliency maps for first-person videos based on sparse coding
Li, Yujie
Kanemura, Atsunori
Asoh, Hideki
Miyanishi, Taiki
Kawanabe, Motoaki
2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 2000 - 2005
[2] Gaze prediction for first-person videos based on inverse non-negative sparse coding with determinant sparse measure
Li, Yujie
Tan, Benying
Akaho, Shotaro
Asoh, Hideki
Ding, Shuxue
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2021, 81
[3] Future Person Localization in First-Person Videos
Yagi, Takuma
Mangalam, Karttikeya
Yonetani, Ryo
Sato, Yoichi
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7593 - 7602
[4] Supervised Saliency Mapping for First-Person Videos With an Inverse Sparse Coding Framework
Li, Yujie
Akaho, Shotaro
Asoh, Hideki
Tan, Benying
IEEE ACCESS, 2019, 7 : 12547 - 12556
[5] CHARACTERIZING DISTORTIONS IN FIRST-PERSON VIDEOS
Bai, Chen
Reibman, Amy R.
2016 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2016, : 2440 - 2444
[6] An l1/2-Norm Regularizer-Based Sparse Coding Framework for Gaze Prediction in First-Person Videos
Li, Yujie
Li, Zhenni
Kanemura, Atsunori
IEEE ACCESS, 2019, 7 : 42472 - 42481
[7] First-person Hyper-lapse Videos
Kopf, Johannes
Cohen, Michael F.
Szeliski, Richard
ACM TRANSACTIONS ON GRAPHICS, 2014, 33 (04):
[8] Pooled Motion Features for First-Person Videos
Ryoo, M. S.
Rothrock, Brandon
Matthies, Larry
2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2015, : 896 - 904
[9] Viewing Experience Model of First-Person Videos
Ma, Biao
Reibman, Amy R.
JOURNAL OF IMAGING, 2018, 4 (09)
[10] Personal Object Discovery in First-Person Videos
Lu, Cewu
Liao, Renjie
Jia, Jiaya
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015, 24 (12) : 5789 - 5799

← 1 2 3 4 5 →