EmbedCaps-DBP: Predicting DNA-Binding Proteins Using Protein Sequence Embedding and Capsule Network

被引:3
|
作者
Naim, Muhammad Khaerul [1 ,3 ]
Mengko, Tati Rajab [1 ]
Hertadi, Rukman [2 ]
Purwarianti, Ayu [1 ,4 ]
Susanty, Meredita [1 ,5 ]
机构
[1] Bandung Inst Technol, Sch Elect Engn & Informat, Bandung 40132, Indonesia
[2] Bandung Inst Technol, Fac Math & Nat Sci, Bandung 40132, Indonesia
[3] Universal Univ, Dept Informat Engn, Batam 29433, Indonesia
[4] Bandung Inst Technol, Ctr Artificial Intelligence U CoE AI VLB, Bandung 40132, Indonesia
[5] Univ Pertamina, Dept Comp Sci, Jakarta 12220, Indonesia
来源
IEEE ACCESS | 2023年 / 11卷
关键词
Protein sequence; Training; Amino acids; Transformers; Feature extraction; Task analysis; Biological system modeling; DNA; Machine learning; Capsule network; DNA-binding proteins; deep learning; machine learning; protein sequence embeddings; IDENTIFICATION; RESIDUES; PSEAAC; DPP;
D O I
10.1109/ACCESS.2023.3328960
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
DNA-binding interactions are an essential biological activity with important functions, such as DNA replication, transcription, repair, and recombination. DNA-binding proteins (DBPs) have been strongly associated with various human diseases, such as asthma, cancer, and HIV/AIDS. Therefore, some DBPs are used in the pharmaceutical industry to produce antibiotics, anticancer drugs, and anti-inflammatory drugs. Most previous methods have used evolutionary information to predict DBPs. However, these methods have high computing costs and produce unsatisfactory results. This study presents EmbedCaps-DBP, a new method for improving DBP prediction. First, we used three protein sequence embeddings (ProtT5, ESM-1b, and ESM-2) to extract learned feature representations from protein sequences. Those embedding methods can capture important information about amino acids, such as biophysics, biochemistry, structure, and domains, that have not been fully utilized in protein annotation tasks. Then, we used a 1D-capsule network (CapsNet) as a classifier. EmbedCaps-DBP significantly outperformed all existing classifiers in training and independent datasets. Based on two independent datasets, EmbedCaps-DBP (ProtT5) achieved 12.65% and 0.33% higher accuracies than a recent predictor on PDB2272 and PDB186, respectively. These results indicate that our proposed method is a promising predictor of DBPs.
引用
收藏
页码:121256 / 121268
页数:13
相关论文
共 50 条
  • [41] Using hidden Markov models to predict DNA-binding proteins with sequence and structure information
    Hsu, Yi-Yu
    Chen, Wei-Jhih
    Chen, Shu-Hui
    Kao, Hung-Yu
    SOFT COMPUTING, 2014, 18 (12) : 2365 - 2376
  • [42] An ensemble of reduced alphabets with protein encoding based on grouped weight for predicting DNA-binding proteins
    Loris Nanni
    Alessandra Lumini
    Amino Acids, 2009, 36 : 167 - 175
  • [43] Predicting the Sequence Specificities of DNA-Binding Proteins by DNA Fine-Tuned Language Model With Decaying Learning Rates
    He, Ying
    Zhang, Qinhu
    Wang, Siguo
    Chen, Zhanheng
    Cui, Zhen
    Guo, Zhen-Hao
    Huang, De-Shuang
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2023, 20 (01) : 616 - 624
  • [44] An ensemble of reduced alphabets with protein encoding based on grouped weight for predicting DNA-binding proteins
    Nanni, Loris
    Lumini, Alessandra
    AMINO ACIDS, 2009, 36 (02) : 167 - 175
  • [45] Using hidden Markov models to predict DNA-binding proteins with sequence and structure information
    Yi-Yu Hsu
    Wei-Jhih Chen
    Shu-Hui Chen
    Hung-Yu Kao
    Soft Computing, 2014, 18 : 2365 - 2376
  • [46] Identification of a Nonstructural DNA-Binding Protein (DBP) as an Antigen with Diagnostic Potential for Human Adenovirus
    Guo, Li
    Wu, Chengjun
    Zhou, Hongli
    Wu, Chao
    Paranhos-Baccala, Glaucia
    Vernet, Guy
    Jin, Qi
    Wang, Jianwei
    Hung, Tao
    PLOS ONE, 2013, 8 (03):
  • [47] Predicting Target DNA Sequences of DNA-Binding Proteins Based on Unbound Structures
    Chen, Chien-Yu
    Chien, Ting-Ying
    Lin, Chih-Kang
    Lin, Chih-Wei
    Weng, Yi-Zhong
    Chang, Darby Tien-Hao
    PLOS ONE, 2012, 7 (02):
  • [48] Protein-Induced DNA Unwinding is An Intrinsic Feature of Certain Sequence-Specific DNA-Binding Proteins
    Leng, Fenfei
    Chen, Bo
    BIOPHYSICAL JOURNAL, 2009, 96 (03) : 414A - 414A
  • [49] Identification of DNA-Binding and Protein-Binding Proteins Using Enhanced Graph Wavelet Features
    Zhu, Yuan
    Zhou, Weiqiang
    Dai, Dao-Qing
    Yan, Hong
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2013, 10 (04) : 1017 - 1031
  • [50] The karyosphere capsule in Rana temporaria oocytes contains structural and DNA-binding proteins
    Ilicheva, Nadya
    Podgornaya, Olga
    Bogolyubov, Dmitry
    Pochukalina, Galina
    NUCLEUS, 2018, 9 (01) : 516 - 529