Self-Supervised RGB-NIR Fusion Video Vision Transformer Framework for rPPG Estimation

被引:23
|
作者
Park, Soyeon [1 ]
Kim, Bo-Kyeong [2 ]
Dong, Suh-Yeon [1 ]
机构
[1] Sookmyung Womens Univ, HCI Lab IT Engn, Seoul 04310, South Korea
[2] Nota Inc, Seoul 06212, South Korea
基金
新加坡国家研究基金会;
关键词
Near-infrared (NIR); remote heart rate (HR) measurement; remote photoplethysmography (rPPG); RGB; self-supervised learning (SSL); video vision transformer (ViViT); HEART-RATE ESTIMATION; PHOTOPLETHYSMOGRAPHY;
D O I
10.1109/TIM.2022.3217867
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Remote photoplethysmography (rPPG) is a technology that can estimate noncontact heart rate (HR) using facial videos. Estimating rPPG signals requires low cost, and thus, it is widely used for noncontact health monitoring. Recent HR estimation studies based on rPPG heavily rely on the supervised feature learning on normal RGB videos. However, the RGB-only methods are significantly affected by head movements and various illumination conditions, and it is difficult to obtain large-scale labeled data for rPPG to determine the performance of supervised learning methods. To address these problems, we present the first of its kind self-supervised transformer-based fusion learning framework for rPPG estimation. In our study, we propose an end-to-end fusion video vision transformer (Fusion ViViT) network that can extract long-range local and global spatiotemporal features from videos and convert them into video sequences to enhance the rPPG representation. In addition, the self-attention of the transformer integrates the spatiotemporal representations of complementary RGB and near-infrared (NIR), which, in turn, enable robust HR estimation even under complex conditions. We use contrastive learning as a self-supervised learning (SSL) scheme. We evaluate our framework on public datasets containing both RGB, NIR videos and physiological signals. The result of near-instant HR (approximately 6 s) estimation on the large-scale rPPG dataset with various scenarios was 14.86 of root mean squared error (RMSE), which was competitive with the state-of-the-art accuracy of average HR (approximately 30 s). Furthermore, transfer learning results on the driving rPPG dataset showed a stable HR estimation performance with 16.94 of RMSE, demonstrating that our framework can be utilized in the real world.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] Self-supervised Learning for Fusion of IR and RGB Images in Visual Teach and Repeat Navigation
    Liu, Xinyu
    Rozsypalek, Zdenek
    Krajnik, Tomas
    2023 EUROPEAN CONFERENCE ON MOBILE ROBOTS, ECMR, 2023, : 57 - 63
  • [32] Self-supervised fusion network for RGB-D interest point detection and description
    Li, Ningning
    Wang, Xiaomin
    Zheng, Zhou
    Sun, Zhendong
    PATTERN RECOGNITION, 2025, 158
  • [33] Exploring Efficiency of Vision Transformers for Self-Supervised Monocular Depth Estimation
    Karpov, Aleksei
    Makarov, Ilya
    2022 IEEE INTERNATIONAL SYMPOSIUM ON MIXED AND AUGMENTED REALITY (ISMAR 2022), 2022, : 711 - 719
  • [34] Pseudo-label enhancement for weakly supervised object detection using self-supervised vision transformer
    Yang, Kequan
    Wu, Yuanchen
    Li, Jide
    Yin, Chao
    Li, Xiaoqiang
    KNOWLEDGE-BASED SYSTEMS, 2025, 311
  • [35] Improving Streaming Transformer Based ASR Under a Framework of Self-supervised Learning
    Cao, Songjun
    Kang, Yueteng
    Fu, Yanzhe
    Xu, Xiaoshuo
    Sun, Sining
    Zhang, Yike
    Ma, Long
    INTERSPEECH 2021, 2021, : 706 - 710
  • [36] Decoupled spatiotemporal adaptive fusion network for self-supervised motion estimation
    Sun, Zitang
    Luo, Zhengbo
    Nishida, Shin'ya
    NEUROCOMPUTING, 2023, 534 : 133 - 146
  • [37] Lightweight Self-Supervised Monocular Depth Estimation Through CNN and Transformer Integration
    Wang, Zhe
    Zou, Yongjia
    Lv, Jin
    Cao, Yang
    Yu, Hongfei
    IEEE ACCESS, 2024, 12 : 167934 - 167943
  • [38] A self-supervised vision transformer to predict survival from histopathology in renal cell carcinoma
    Wessels, Frederik
    Schmitt, Max
    Krieghoff-Henning, Eva
    Nientiedt, Malin
    Waldbillig, Frank
    Neuberger, Manuel
    Kriegmair, Maximilian C.
    Kowalewski, Karl-Friedrich
    Worst, Thomas S.
    Steeg, Matthias
    Popovic, Zoran V.
    Gaiser, Timo
    von Kalle, Christof
    Utikal, Jochen S.
    Frohling, Stefan
    Michel, Maurice S.
    Nuhn, Philipp
    Brinker, Titus J.
    WORLD JOURNAL OF UROLOGY, 2023, 41 (08) : 2233 - 2241
  • [39] A self-supervised vision transformer to predict survival from histopathology in renal cell carcinoma
    Frederik Wessels
    Max Schmitt
    Eva Krieghoff-Henning
    Malin Nientiedt
    Frank Waldbillig
    Manuel Neuberger
    Maximilian C. Kriegmair
    Karl-Friedrich Kowalewski
    Thomas S. Worst
    Matthias Steeg
    Zoran V. Popovic
    Timo Gaiser
    Christof von Kalle
    Jochen S. Utikal
    Stefan Fröhling
    Maurice S. Michel
    Philipp Nuhn
    Titus J. Brinker
    World Journal of Urology, 2023, 41 : 2233 - 2241
  • [40] Vision Transformer-Based Self-supervised Learning for Ulcerative Colitis Grading in Colonoscopy
    Pyatha, Ajay
    Xu, Ziang
    Ali, Sharib
    DATA ENGINEERING IN MEDICAL IMAGING, DEMI 2023, 2023, 14314 : 102 - 110