SFM: A novel sequence-based fusion method for disease genes identification and prioritization

被引:10
|
作者
Yousef, Abdulaziz [1 ]
Charkari, Nasrollah Moghadam [1 ]
机构
[1] Tarbiat Modares Univ, Fac Elect & Comp Engn, Tehran, Iran
关键词
Classification; Disease gene; Protein; Physicochemical properties of amino acid; Fusion method; PROTEIN-PROTEIN INTERACTIONS; PREDICTION; FEATURES; AUTOCORRELATION; CLASSIFICATION; SIMILARITY; SURFACE;
D O I
10.1016/j.jtbi.2015.07.010
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The identification of disease genes from human genome is of great importance to improve diagnosis and treatment of disease. Several machine learning methods have been introduced to identify disease genes. However, these methods mostly differ in the prior knowledge used to construct the feature vector for each instance (gene), the ways of selecting negative data (non-disease genes) where there is no investigational approach to find them and the classification methods used to make the final decision. In this work, a novel Sequence-based fusion method (SFM) is proposed to identify disease genes. In this regard, unlike existing methods, instead of using a noisy and incomplete prior-knowledge, the amino acid sequence of the proteins which is universal data has been carried out to present the genes (proteins) into four different feature vectors. To select more likely negative data from candidate genes, the intersection set of four negative sets which are generated using distance approach is considered. Then, Decision Tree (C4.5) has been applied as a fusion method to combine the results of four independent state-of the-art predictors based on support vector machine (SVM) algorithm, and to make the final decision. The experimental results of the proposed method have been evaluated by some standard measures. The results indicate the precision, recall and F-measure of 82.6%, 85.6% and 84, respectively. These results confirm the efficiency and validity of the proposed method. (C) 2015 Elsevier Ltd. All rights reserved.
引用
收藏
页码:12 / 19
页数:8
相关论文
共 50 条
  • [31] Identification of a novel HLA-A*26 allele, A*26:36, by sequence-based typing
    Li, J-P
    Li, X-F
    Chen, Y.
    Zhang, X.
    Liu, X-Z
    TISSUE ANTIGENS, 2011, 77 (02): : 154 - U2
  • [32] Identification of a novel HLA-A*02 allele, A*02:428, by sequence-based typing
    Han, S. -H.
    Heo, Y. -A.
    Kwon, O. -J.
    Kim, Y. -J.
    Lee, K. -R.
    TISSUE ANTIGENS, 2014, 84 (06): : 574 - 575
  • [33] Identification of the novel allele HLA-B13:157 by sequence-based typing
    Shi, Xiu-Min
    Hu, Rui-Ping
    Li, Pei-Tong
    Han, Wei
    Gao, Su-Jun
    HLA, 2022, 100 (03) : 265 - 266
  • [34] Identification of a novel allele HIA-A*9206 by sequence-based typing in the Chinese population
    Zhu, F.-M.
    He, J.-J.
    Yan, L.-X.
    TISSUE ANTIGENS, 2007, 70 (03): : 257 - 257
  • [35] Identification of a novel HLA-A*24:02:55 allele by sequence-based typing
    Guolong, Y.
    Limei, D.
    Xinge, Y.
    Jin, Y.
    Peng, L.
    TISSUE ANTIGENS, 2012, 80 (02): : 193 - 194
  • [36] A novel sequence-based prediction method for ATP-binding sites using fusion of SMOTE algorithm and random forests classifier
    Song, Jiazhi
    Liu, Guixia
    Song, Chuyi
    Jiang, Jingqing
    BIOTECHNOLOGY & BIOTECHNOLOGICAL EQUIPMENT, 2020, 34 (01) : 1337 - 1347
  • [37] Identification of a novel allele HLA-A*2489 by sequence-based typing in a Chinese individual
    Xiao, Y.
    Zhou, X. -Y.
    Liu, N.
    Zhang, Z. -X.
    Cai, J. -P.
    TISSUE ANTIGENS, 2009, 74 (03): : 250 - 250
  • [38] A Simplified Sequence-Based Identification Scheme for Bordetella Reveals Several Putative Novel Species
    Spilker, Theodore
    Leber, Amy L.
    Marcon, Mario J.
    Newton, Duane W.
    Darrah, Rebecca
    Vandamme, Peter
    LiPuma, John J.
    JOURNAL OF CLINICAL MICROBIOLOGY, 2014, 52 (02) : 674 - 677
  • [39] A sequence-based computational method for prediction of MoRFs
    Wang, Yu
    Guo, Yanzhi
    Pu, Xuemei
    Li, Menglong
    RSC ADVANCES, 2017, 7 (31) : 18937 - 18945
  • [40] A universal DNA extraction and PCR amplification method for fungal rDNA sequence-based identification
    Romanelli, A. M.
    Fu, J.
    Herrera, M. L.
    Wickes, B. L.
    MYCOSES, 2014, 57 (10) : 612 - 622