Self-Supervised RGB-NIR Fusion Video Vision Transformer Framework for rPPG Estimation

被引：23

作者：

Park, Soyeon ^{[1
]}

Kim, Bo-Kyeong ^{[2
]}

Dong, Suh-Yeon ^{[1
]}

机构：

[1] Sookmyung Womens Univ, HCI Lab IT Engn, Seoul 04310, South Korea

[2] Nota Inc, Seoul 06212, South Korea

来源：

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT | 2022年 / 71卷

基金：

新加坡国家研究基金会;

关键词：

Near-infrared (NIR); remote heart rate (HR) measurement; remote photoplethysmography (rPPG); RGB; self-supervised learning (SSL); video vision transformer (ViViT); HEART-RATE ESTIMATION; PHOTOPLETHYSMOGRAPHY;

D O I：

10.1109/TIM.2022.3217867

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Remote photoplethysmography (rPPG) is a technology that can estimate noncontact heart rate (HR) using facial videos. Estimating rPPG signals requires low cost, and thus, it is widely used for noncontact health monitoring. Recent HR estimation studies based on rPPG heavily rely on the supervised feature learning on normal RGB videos. However, the RGB-only methods are significantly affected by head movements and various illumination conditions, and it is difficult to obtain large-scale labeled data for rPPG to determine the performance of supervised learning methods. To address these problems, we present the first of its kind self-supervised transformer-based fusion learning framework for rPPG estimation. In our study, we propose an end-to-end fusion video vision transformer (Fusion ViViT) network that can extract long-range local and global spatiotemporal features from videos and convert them into video sequences to enhance the rPPG representation. In addition, the self-attention of the transformer integrates the spatiotemporal representations of complementary RGB and near-infrared (NIR), which, in turn, enable robust HR estimation even under complex conditions. We use contrastive learning as a self-supervised learning (SSL) scheme. We evaluate our framework on public datasets containing both RGB, NIR videos and physiological signals. The result of near-instant HR (approximately 6 s) estimation on the large-scale rPPG dataset with various scenarios was 14.86 of root mean squared error (RMSE), which was competitive with the state-of-the-art accuracy of average HR (approximately 30 s). Furthermore, transfer learning results on the driving rPPG dataset showed a stable HR estimation performance with 16.94 of RMSE, demonstrating that our framework can be utilized in the real world.

引用

页数：10

共 50 条

[31] Self-supervised Learning for Fusion of IR and RGB Images in Visual Teach and Repeat Navigation
Liu, Xinyu
Rozsypalek, Zdenek
Krajnik, Tomas
2023 EUROPEAN CONFERENCE ON MOBILE ROBOTS, ECMR, 2023, : 57 - 63
[32] Self-supervised fusion network for RGB-D interest point detection and description
Li, Ningning
Wang, Xiaomin
Zheng, Zhou
Sun, Zhendong
PATTERN RECOGNITION, 2025, 158
[33] Exploring Efficiency of Vision Transformers for Self-Supervised Monocular Depth Estimation
Karpov, Aleksei
Makarov, Ilya
2022 IEEE INTERNATIONAL SYMPOSIUM ON MIXED AND AUGMENTED REALITY (ISMAR 2022), 2022, : 711 - 719
[34] Pseudo-label enhancement for weakly supervised object detection using self-supervised vision transformer
Yang, Kequan
Wu, Yuanchen
Li, Jide
Yin, Chao
Li, Xiaoqiang
KNOWLEDGE-BASED SYSTEMS, 2025, 311
[35] Improving Streaming Transformer Based ASR Under a Framework of Self-supervised Learning
Cao, Songjun
Kang, Yueteng
Fu, Yanzhe
Xu, Xiaoshuo
Sun, Sining
Zhang, Yike
Ma, Long
INTERSPEECH 2021, 2021, : 706 - 710
[36] Decoupled spatiotemporal adaptive fusion network for self-supervised motion estimation
Sun, Zitang
Luo, Zhengbo
Nishida, Shin'ya
NEUROCOMPUTING, 2023, 534 : 133 - 146
[37] Lightweight Self-Supervised Monocular Depth Estimation Through CNN and Transformer Integration
Wang, Zhe
Zou, Yongjia
Lv, Jin
Cao, Yang
Yu, Hongfei
IEEE ACCESS, 2024, 12 : 167934 - 167943
[38] A self-supervised vision transformer to predict survival from histopathology in renal cell carcinoma
Wessels, Frederik
Schmitt, Max
Krieghoff-Henning, Eva
Nientiedt, Malin
Waldbillig, Frank
Neuberger, Manuel
Kriegmair, Maximilian C.
Kowalewski, Karl-Friedrich
Worst, Thomas S.
Steeg, Matthias
Popovic, Zoran V.
Gaiser, Timo
von Kalle, Christof
Utikal, Jochen S.
Frohling, Stefan
Michel, Maurice S.
Nuhn, Philipp
Brinker, Titus J.
WORLD JOURNAL OF UROLOGY, 2023, 41 (08) : 2233 - 2241
[39] A self-supervised vision transformer to predict survival from histopathology in renal cell carcinoma
Frederik Wessels
Max Schmitt
Eva Krieghoff-Henning
Malin Nientiedt
Frank Waldbillig
Manuel Neuberger
Maximilian C. Kriegmair
Karl-Friedrich Kowalewski
Thomas S. Worst
Matthias Steeg
Zoran V. Popovic
Timo Gaiser
Christof von Kalle
Jochen S. Utikal
Stefan Fröhling
Maurice S. Michel
Philipp Nuhn
Titus J. Brinker
World Journal of Urology, 2023, 41 : 2233 - 2241
[40] Vision Transformer-Based Self-supervised Learning for Ulcerative Colitis Grading in Colonoscopy
Pyatha, Ajay
Xu, Ziang
Ali, Sharib
DATA ENGINEERING IN MEDICAL IMAGING, DEMI 2023, 2023, 14314 : 102 - 110

← 1 2 3 4 5 →