Multistream Gaze Estimation with Anatomical Eye Region Isolation by Synthetic to Real Transfer Learning

被引：1

作者：

Mahmud Z. ^{[1
,3
]}

Hungler P. ^{[3
]}

Etemad A. ^{[1
,3
]}

机构：

[1] s University, Kingston, Ontario

[2] s University, Kingston, Ontario

来源：

IEEE Transactions on Artificial Intelligence | 2024年 / 5卷 / 08期

关键词：

deep neural network; domain randomization; Estimation; eye region segmentation; Feature extraction; Gaze estimation; Head; Iris; Lighting; multistream network; Synthetic data; Training; transfer learning;

D O I：

10.1109/TAI.2024.3366174

中图分类号：

学科分类号：

摘要：

We propose a novel neural pipeline, MSGazeNet, that learns gaze representations by taking advantage of the eye anatomy information through a multistream framework. Our proposed solution comprises two components, first a network for isolating anatomical eye regions, and a second network for multistream gaze estimation. The eye region isolation is performed with a U-Net style network which we train using a synthetic dataset that contains eye region masks for the visible eyeball and the iris region. The synthetic dataset used in this stage is procured using the UnityEyes simulator, and consists of 80,000 eye images. Successive to training, the eye region isolation network is then transferred to the real domain for generating masks for the real-world eye images. In order to successfully make the transfer, we exploit domain randomization in the training process, which allows for the synthetic images to benefit from a larger variance with the help of augmentations that resemble artifacts. The generated eye region masks along with the raw eye images are then used together as a multistream input to our gaze estimation network, which consists of wide residual blocks. The output embeddings from these encoders are fused in the channel dimension before feeding into the gaze regression layers. We evaluate our framework on three gaze estimation datasets and achieve strong performances. Our method surpasses the state-of-the-art by 7.57% and 1.85% on two datasets, and obtains competitive results on the other. We also study the robustness of our method with respect to the noise in the data and demonstrate that our model is less sensitive to noisy data. Lastly, we perform a variety of experiments including ablation studies to evaluate the contribution of different components and design choices in our solution. IEEE

引用

页码：1 / 15

页数：14

共 50 条

[41] Automatic Eye Type Detection in Retinal Fundus Image Using Fusion of Transfer Learning and Anatomical Features
Roy, Pallab Kanti
Chakravorty, Rajib
Sedai, Suman
Mahapatra, Dwarikanath
Garnavi, Rahil
2016 INTERNATIONAL CONFERENCE ON DIGITAL IMAGE COMPUTING: TECHNIQUES AND APPLICATIONS (DICTA), 2016, : 538 - 544
[42] Robust deep learning for eye fundus images: Bridging real and synthetic data for enhancing generalization
Oliveira, Guilherme C.
Rosa, Gustavo H.
Pedronette, Daniel C.G.
Papa, João P.
Kumar, Himeesh
Passos, Leandro A.
Kumar, Dinesh
Biomedical Signal Processing and Control, 2024, 94
[43] Robust deep learning for eye fundus images: Bridging real and synthetic data for enhancing generalization
Oliveira, Guilherme C.
Rosa, Gustavo H.
Pedronette, Daniel C. G.
Papa, Joao P.
Kumar, Himeesh
Passos, Leandro A.
Kumar, Dinesh
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2024, 94
[44] Real-Time Estimation of Eye Movement Condition Using a Deep Learning Model
Sugiura, Akihiro
Itazu, Yoshiki
Tanaka, Kunihiko
Takada, Hiroki
HCI INTERNATIONAL 2021 - LATE BREAKING PAPERS: MULTIMODALITY, EXTENDED REALITY, AND ARTIFICIAL INTELLIGENCE, 2021, 13095 : 132 - 143
[45] Estimation of behavioral user state based on eye gaze and head pose-application in an e-learning environment
Asteriadis, Stylianos
Tzouveli, Paraskevi
Karpouzis, Kostas
Kollias, Stefanos
MULTIMEDIA TOOLS AND APPLICATIONS, 2009, 41 (03) : 469 - 493
[46] Transfer Learning from Synthetic to Real-Noise Denoising with Adaptive Instance Normalization
Kim, Yoonsik
Soh, Jae Woong
Park, Gu Yong
Cho, Nam Ik
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 3479 - 3489
[47] TRANSFER LEARNING FROM SYNTHETIC TO REAL IMAGES USING VARIATIONAL AUTOENCODERS FOR PRECISE POSITION DETECTION
Inoue, Tadanobu
Chaudhury, Subhajit
De Magistris, Giovanni
Dasgupta, Sakyasingha
2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 2725 - 2729
[48] Driver Gaze Zone Estimation Based on Three-Channel Convolution-Optimized Vision Transformer With Transfer Learning
Li, Zhao
Jiang, Siyang
Fu, Rui
Guo, Yingshi
Wang, Chang
IEEE SENSORS JOURNAL, 2024, 24 (24) : 42064 - 42078
[49] Get a Grip: Slippage-Robust and Glint-Free Gaze Estimation for Real-Time Pervasive Head-Mounted Eye Tracking
Santini, Thiago
Niehorster, Diederick C.
Kasneci, Enkelejda
ETRA 2019: 2019 ACM SYMPOSIUM ON EYE TRACKING RESEARCH & APPLICATIONS, 2019,
[50] A robust, real-time camera-based eye gaze tracking system to analyze users' visual attention using deep learning
Singh, Jaiteg
Modi, Nandini
INTERACTIVE LEARNING ENVIRONMENTS, 2024, 32 (02) : 409 - 430

← 1 2 3 4 5 →