Self-Supervised RGB-NIR Fusion Video Vision Transformer Framework for rPPG Estimation

被引：23

作者：

Park, Soyeon ^{[1
]}

Kim, Bo-Kyeong ^{[2
]}

Dong, Suh-Yeon ^{[1
]}

机构：

[1] Sookmyung Womens Univ, HCI Lab IT Engn, Seoul 04310, South Korea

[2] Nota Inc, Seoul 06212, South Korea

来源：

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT | 2022年 / 71卷

基金：

新加坡国家研究基金会;

关键词：

Near-infrared (NIR); remote heart rate (HR) measurement; remote photoplethysmography (rPPG); RGB; self-supervised learning (SSL); video vision transformer (ViViT); HEART-RATE ESTIMATION; PHOTOPLETHYSMOGRAPHY;

D O I：

10.1109/TIM.2022.3217867

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Remote photoplethysmography (rPPG) is a technology that can estimate noncontact heart rate (HR) using facial videos. Estimating rPPG signals requires low cost, and thus, it is widely used for noncontact health monitoring. Recent HR estimation studies based on rPPG heavily rely on the supervised feature learning on normal RGB videos. However, the RGB-only methods are significantly affected by head movements and various illumination conditions, and it is difficult to obtain large-scale labeled data for rPPG to determine the performance of supervised learning methods. To address these problems, we present the first of its kind self-supervised transformer-based fusion learning framework for rPPG estimation. In our study, we propose an end-to-end fusion video vision transformer (Fusion ViViT) network that can extract long-range local and global spatiotemporal features from videos and convert them into video sequences to enhance the rPPG representation. In addition, the self-attention of the transformer integrates the spatiotemporal representations of complementary RGB and near-infrared (NIR), which, in turn, enable robust HR estimation even under complex conditions. We use contrastive learning as a self-supervised learning (SSL) scheme. We evaluate our framework on public datasets containing both RGB, NIR videos and physiological signals. The result of near-instant HR (approximately 6 s) estimation on the large-scale rPPG dataset with various scenarios was 14.86 of root mean squared error (RMSE), which was competitive with the state-of-the-art accuracy of average HR (approximately 30 s). Furthermore, transfer learning results on the driving rPPG dataset showed a stable HR estimation performance with 16.94 of RMSE, demonstrating that our framework can be utilized in the real world.

引用

页数：10

共 50 条

[41] MS-DINO: Masked Self-Supervised Distributed Learning Using Vision Transformer
Park, Sangjoon
Lee, Ik Jae
Kim, Jun Won
Ye, Jong Chul
IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2024, 28 (10) : 6180 - 6192
[42] Enhance fashion classification of mosquito vector species via self-supervised vision transformer
Kittichai, Veerayuth
Kaewthamasorn, Morakot
Chaiphongpachara, Tanawat
Laojun, Sedthapong
Saiwichai, Tawee
Naing, Kaung Myat
Tongloy, Teerawat
Boonsang, Siridech
Chuwongin, Santhad
SCIENTIFIC REPORTS, 2024, 14 (01):
[43] Self-Supervised Video Super-Resolution by Spatial Constraint and Temporal Fusion
Yang, Cuixin
Luo, Hongming
Liao, Guangsen
Lu, Zitao
Zhou, Fei
Qiu, Guoping
PATTERN RECOGNITION AND COMPUTER VISION,, PT III, 2021, 13021 : 249 - 260
[44] SOFT: Self-supervised sparse Optical Flow Transformer for video stabilization via quaternion
Wang, Naiyao
Zhou, Changdong
Zhu, Rongfeng
Zhang, Bo
Wang, Ye
Liu, Hongbo
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 130
[45] GLOCAL: A self-supervised learning framework for global and local motion estimation
Zheng, Yihao
Luo, Kunming
Liu, Shuaicheng
Li, Zun
Xiang, Ye
Wu, Lifang
Zeng, Bing
Chen, Chang Wen
PATTERN RECOGNITION LETTERS, 2024, 178 : 91 - 97
[46] Learning by Distillation: A Self-Supervised Learning Framework for Optical Flow Estimation
Liu, Pengpeng
Lyu, Michael R.
King, Irwin
Xu, Jia
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (09) : 5026 - 5041
[47] Supervised and self-supervised learning-based cascade spatiotemporal fusion framework and its application
Sun, Weixuan
Li, Jie
Jiang, Menghui
Yuan, Qiangqiang
ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2023, 203 : 19 - 36
[48] SFT: Few-Shot Learning via Self-Supervised Feature Fusion With Transformer
Lim, Jit Yan
Lim, Kian Ming
Lee, Chin Poo
Tan, Yong Xuan
IEEE ACCESS, 2024, 12 : 86690 - 86703
[49] Multi Self-Supervised Pre-Finetuned Transformer Fusion for Better Vehicle Detection
Zheng, Juwu
Ren, Jiangtao
IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2025, 22 : 2075 - 2089
[50] Multi Self-Supervised Pre-Finetuned Transformer Fusion for Better Vehicle Detection
Zheng, Juwu
Ren, Jiangtao
IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2025, 22 : 2075 - 2089

← 1 2 3 4 5 →