A self-supervised deep learning method for data-efficient training in genomics

被引:0
|
作者
Hüseyin Anil Gündüz
Martin Binder
Xiao-Yin To
René Mreches
Bernd Bischl
Alice C. McHardy
Philipp C. Münch
Mina Rezaei
机构
[1] LMU Munich,Department of Statistics
[2] Munich Center for Machine Learning,Department for Computational Biology of Infection Research
[3] Helmholtz Center for Infection Research,Braunschweig Integrated Centre of Systems Biology (BRICS)
[4] Technische Universität Braunschweig,German Center for Infection Research (DZIF)
[5] partner site Hannover Braunschweig,Department of Biostatistics
[6] Harvard School of Public Health,undefined
来源
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Deep learning in bioinformatics is often limited to problems where extensive amounts of labeled data are available for supervised classification. By exploiting unlabeled data, self-supervised learning techniques can improve the performance of machine learning models in the presence of limited labeled data. Although many self-supervised learning methods have been suggested before, they have failed to exploit the unique characteristics of genomic data. Therefore, we introduce Self-GenomeNet, a self-supervised learning technique that is custom-tailored for genomic data. Self-GenomeNet leverages reverse-complement sequences and effectively learns short- and long-term dependencies by predicting targets of different lengths. Self-GenomeNet performs better than other self-supervised methods in data-scarce genomic tasks and outperforms standard supervised training with ~10 times fewer labeled training data. Furthermore, the learned representations generalize well to new datasets and tasks. These findings suggest that Self-GenomeNet is well suited for large-scale, unlabeled genomic datasets and could substantially improve the performance of genomic models.
引用
收藏
相关论文
共 50 条
  • [31] Seismic Data Denoising Using a Self-Supervised Deep Learning Network
    Detao Wang
    Guoxiong Chen
    Jianwei Chen
    Qiuming Cheng
    Mathematical Geosciences, 2024, 56 : 487 - 510
  • [32] Robust seismic data denoising via self-supervised deep learning
    Li, Ji
    Trad, Daniel
    Liu, Dawei
    GEOPHYSICS, 2024, 89 (05) : V437 - V451
  • [33] Deep reinforcement learning for data-efficient weakly supervised business process anomaly detection
    Elaziz, Eman Abd
    Fathalla, Radwa
    Shaheen, Mohamed
    JOURNAL OF BIG DATA, 2023, 10 (01)
  • [34] Deep reinforcement learning for data-efficient weakly supervised business process anomaly detection
    Eman Abd Elaziz
    Radwa Fathalla
    Mohamed Shaheen
    Journal of Big Data, 10
  • [35] An Assessment of Self-supervised Learning for Data Efficient Potato Instance Segmentation
    Hurst, Bradley
    Bellotto, Nicola
    Bosilj, Petra
    TOWARDS AUTONOMOUS ROBOTIC SYSTEMS, TAROS 2023, 2023, 14136 : 267 - 278
  • [36] Deep active sampling with self-supervised learning
    Shi, Haochen
    Zhou, Hui
    FRONTIERS OF COMPUTER SCIENCE, 2023, 17 (04)
  • [37] Self-Supervised Deep Metric Learning for Pointsets
    Arsomngern, Pattaramanee
    Long, Cheng
    Suwajanakorn, Supasorn
    Nutanong, Sarana
    2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021), 2021, : 2171 - 2176
  • [38] Deep active sampling with self-supervised learning
    Haochen SHI
    Hui ZHOU
    Frontiers of Computer Science, 2023, 17 (04) : 215 - 217
  • [39] Deep Metric Learning with Self-Supervised Ranking
    Fu, Zheren
    Li, Yan
    Mao, Zhendong
    Wang, Quan
    Zhang, Yongdong
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 1370 - 1378
  • [40] Efficient DDPG via the Self-Supervised Method
    Zhang, Guanghao
    Chen, Hongliang
    Li, Jianxun
    PROCEEDINGS OF THE 32ND 2020 CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2020), 2020, : 4636 - 4642