Self-Supervised RGB-NIR Fusion Video Vision Transformer Framework for rPPG Estimation

被引:23
|
作者
Park, Soyeon [1 ]
Kim, Bo-Kyeong [2 ]
Dong, Suh-Yeon [1 ]
机构
[1] Sookmyung Womens Univ, HCI Lab IT Engn, Seoul 04310, South Korea
[2] Nota Inc, Seoul 06212, South Korea
基金
新加坡国家研究基金会;
关键词
Near-infrared (NIR); remote heart rate (HR) measurement; remote photoplethysmography (rPPG); RGB; self-supervised learning (SSL); video vision transformer (ViViT); HEART-RATE ESTIMATION; PHOTOPLETHYSMOGRAPHY;
D O I
10.1109/TIM.2022.3217867
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Remote photoplethysmography (rPPG) is a technology that can estimate noncontact heart rate (HR) using facial videos. Estimating rPPG signals requires low cost, and thus, it is widely used for noncontact health monitoring. Recent HR estimation studies based on rPPG heavily rely on the supervised feature learning on normal RGB videos. However, the RGB-only methods are significantly affected by head movements and various illumination conditions, and it is difficult to obtain large-scale labeled data for rPPG to determine the performance of supervised learning methods. To address these problems, we present the first of its kind self-supervised transformer-based fusion learning framework for rPPG estimation. In our study, we propose an end-to-end fusion video vision transformer (Fusion ViViT) network that can extract long-range local and global spatiotemporal features from videos and convert them into video sequences to enhance the rPPG representation. In addition, the self-attention of the transformer integrates the spatiotemporal representations of complementary RGB and near-infrared (NIR), which, in turn, enable robust HR estimation even under complex conditions. We use contrastive learning as a self-supervised learning (SSL) scheme. We evaluate our framework on public datasets containing both RGB, NIR videos and physiological signals. The result of near-instant HR (approximately 6 s) estimation on the large-scale rPPG dataset with various scenarios was 14.86 of root mean squared error (RMSE), which was competitive with the state-of-the-art accuracy of average HR (approximately 30 s). Furthermore, transfer learning results on the driving rPPG dataset showed a stable HR estimation performance with 16.94 of RMSE, demonstrating that our framework can be utilized in the real world.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] MS-DINO: Masked Self-Supervised Distributed Learning Using Vision Transformer
    Park, Sangjoon
    Lee, Ik Jae
    Kim, Jun Won
    Ye, Jong Chul
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2024, 28 (10) : 6180 - 6192
  • [42] Enhance fashion classification of mosquito vector species via self-supervised vision transformer
    Kittichai, Veerayuth
    Kaewthamasorn, Morakot
    Chaiphongpachara, Tanawat
    Laojun, Sedthapong
    Saiwichai, Tawee
    Naing, Kaung Myat
    Tongloy, Teerawat
    Boonsang, Siridech
    Chuwongin, Santhad
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [43] Self-Supervised Video Super-Resolution by Spatial Constraint and Temporal Fusion
    Yang, Cuixin
    Luo, Hongming
    Liao, Guangsen
    Lu, Zitao
    Zhou, Fei
    Qiu, Guoping
    PATTERN RECOGNITION AND COMPUTER VISION,, PT III, 2021, 13021 : 249 - 260
  • [44] SOFT: Self-supervised sparse Optical Flow Transformer for video stabilization via quaternion
    Wang, Naiyao
    Zhou, Changdong
    Zhu, Rongfeng
    Zhang, Bo
    Wang, Ye
    Liu, Hongbo
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 130
  • [45] GLOCAL: A self-supervised learning framework for global and local motion estimation
    Zheng, Yihao
    Luo, Kunming
    Liu, Shuaicheng
    Li, Zun
    Xiang, Ye
    Wu, Lifang
    Zeng, Bing
    Chen, Chang Wen
    PATTERN RECOGNITION LETTERS, 2024, 178 : 91 - 97
  • [46] Learning by Distillation: A Self-Supervised Learning Framework for Optical Flow Estimation
    Liu, Pengpeng
    Lyu, Michael R.
    King, Irwin
    Xu, Jia
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (09) : 5026 - 5041
  • [47] Supervised and self-supervised learning-based cascade spatiotemporal fusion framework and its application
    Sun, Weixuan
    Li, Jie
    Jiang, Menghui
    Yuan, Qiangqiang
    ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2023, 203 : 19 - 36
  • [48] SFT: Few-Shot Learning via Self-Supervised Feature Fusion With Transformer
    Lim, Jit Yan
    Lim, Kian Ming
    Lee, Chin Poo
    Tan, Yong Xuan
    IEEE ACCESS, 2024, 12 : 86690 - 86703
  • [49] Multi Self-Supervised Pre-Finetuned Transformer Fusion for Better Vehicle Detection
    Zheng, Juwu
    Ren, Jiangtao
    IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2025, 22 : 2075 - 2089
  • [50] Multi Self-Supervised Pre-Finetuned Transformer Fusion for Better Vehicle Detection
    Zheng, Juwu
    Ren, Jiangtao
    IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2025, 22 : 2075 - 2089