High-throughput deep learning variant effect prediction with Sequence UNET

被引:13
|
作者
Dunham, Alistair S. [1 ,2 ]
Beltrao, Pedro [1 ,3 ]
AlQuraishi, Mohammed [4 ]
机构
[1] European Bioinformat Inst EMBL EBI, European Mol Biol Lab, Wellcome Genome Campus, Hinxton CB10 1SD, Cambs, England
[2] Wellcome Sanger Inst, Wellcome Genome Campus, Hinxton CB10 1RQ, Cambs, England
[3] Swiss Fed Inst Technol, Inst Mol Syst Biol, Dept Biol, CH-8093 Zurich, Switzerland
[4] Columbia Univ, Dept Syst Biol, New York, NY 10027 USA
基金
英国惠康基金;
关键词
Variant effect prediction; Deep learning; Mutation; PSSM; Pathogenicity; Machine learning; SERVER;
D O I
10.1186/s13059-023-02948-3
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Understanding coding mutations is important for many applications in biology and medicine but the vast mutation space makes comprehensive experimental characterisation impossible. Current predictors are often computationally intensive and difficult to scale, including recent deep learning models. We introduce Sequence UNET, a highly scalable deep learning architecture that classifies and predicts variant frequency from sequence alone using multi-scale representations from a fully convolutional compression/expansion architecture. It achieves comparable pathogenicity prediction to recent methods. We demonstrate scalability by analysing 8.3B variants in 904,134 proteins detected through large-scale proteomics. Sequence UNET runs on modest hardware with a simple Python package.
引用
收藏
页数:19
相关论文
共 50 条
  • [31] Detecting genomic deletions from high-throughput sequence data with unsupervised learning
    Li X.
    Wu Y.
    BMC Bioinformatics, 2022, 23 (Suppl 8)
  • [32] WHAM: A High-Throughput Sequence Alignment Method
    Li, Yinan
    Patel, Jignesh M.
    Terrell, Allison
    ACM TRANSACTIONS ON DATABASE SYSTEMS, 2012, 37 (04):
  • [33] Mapinsights: deep exploration of quality issues and error profiles in high-throughput sequence data
    Das, Subrata
    Biswas, Nidhan K.
    Basu, Analabha
    NUCLEIC ACIDS RESEARCH, 2023, 51 (14) : E75 - E75
  • [34] High-throughput Sequence Translation Using CUDA
    Sun Wei-dong
    Ma Zong-min
    PROCEEDINGS OF THE 2009 2ND INTERNATIONAL CONFERENCE ON BIOMEDICAL ENGINEERING AND INFORMATICS, VOLS 1-4, 2009, : 2022 - 2026
  • [35] Phigaro: high-throughput prophage sequence annotation
    Starikova, Elizaveta V.
    Tikhonova, Polina O.
    Prianichnikov, Nikita A.
    Rands, Chris M.
    Zdobnov, Evgeny M.
    Ilina, Elena N.
    Govorun, Vadim M.
    BIOINFORMATICS, 2020, 36 (12) : 3882 - 3884
  • [36] High-throughput DNA sequence data compression
    Zhu, Zexuan
    Zhang, Yongpeng
    Ji, Zhen
    He, Shan
    Yang, Xiao
    BRIEFINGS IN BIOINFORMATICS, 2015, 16 (01) : 1 - 15
  • [37] RUMMAGE - a high-throughput sequence annotation system
    Taudien, S
    Rump, A
    Platzer, M
    Drescher, B
    Schattevoy, R
    Gloeckner, G
    Dette, M
    Baumgart, C
    Weber, J
    Menzel, U
    Rosenthal, A
    TRENDS IN GENETICS, 2000, 16 (11) : 519 - 521
  • [38] Prediction of High-Throughput Protein-Protein Interactions based on Protein Sequence Information
    Li, Yixun
    Rezaei, Behzad
    Ngom, Alioune
    Rueda, Luis
    2015 IEEE CONFERENCE ON COMPUTATIONAL INTELLIGENCE IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY (CIBCB), 2015, : 163 - 168
  • [39] SOME INTRIGUING HIGH-THROUGHPUT DNA SEQUENCE VARIANTS PREDICTION OVER PROTEIN FUNCTIONALITY
    Kheirkhah, Atabak
    Daud, Salwani Mohd
    Salleh, Noor Azurati Ahmad
    Sam, Suriani Mohd
    Abas, Hafiza
    Shariff, Sya Azmeela
    Yusof, Yusnaidi Md
    JURNAL TEKNOLOGI-SCIENCES & ENGINEERING, 2016, 78 (6-4): : 1 - 6
  • [40] Deep Fish: Deep Learning-Based Classification of Zebrafish Deformation for High-Throughput Screening
    Ishaq, Omer
    Sadanandan, Sajith Kecheril
    Wahlby, Carolina
    SLAS DISCOVERY, 2017, 22 (01) : 102 - 107