The impact of feature selection on one and two-class classification performance for plant microRNAs

被引:12
|
作者
Khalifa, Waleed [1 ,2 ]
Yousef, Malik [1 ,2 ]
Demirci, Muserref Duygu Sacar [3 ]
Allmer, Jens [3 ,4 ]
机构
[1] Coll Sakhnin, Comp Sci, Sakhnin, Israel
[2] Galilee Soc, Inst Appl Res, Shefa Amr, Israel
[3] Izmir Inst Technol, Mol Biol & Genet, Izmir, Turkey
[4] Bionia Inc, IZTEKGEB, Izmir, Turkey
来源
PEERJ | 2016年 / 4卷
关键词
MicroRNA; Machine learning; Feature selection; Plant; One-class classification; Two-class classification; PREDICTION; SVM; MIRBASE;
D O I
10.7717/peerj.2135
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
MicroRNAs (miRNAs) are short nucleotide sequences that form a typical hairpin structure which is recognized by a complex enzyme machinery. It ultimately leads to the incorporation of 18-24 nt long, mature miRNAs into RISC where they act as recognition keys to aid in regulation of target mRNAs. It is involved to determine miRNAs experimentally and, therefore machine learning is used to complement such endeavors. The success of machine learning mostly depends on proper input data and appropriate features for parameterization of the data. Although, in general, two-class classification (TCC) is used in the field; because negative examples are hard to come by, one-class classification (OCC) has been tried for pre-miRNA detection. Since both positive and negative examples are currently somewhat limited, feature selection can prove to be vital for furthering the field of pre-miRNA detection. In this study, we compare the performance of OCC and TCC using eight feature selection methods and seven different plant species providing positive pre-miRNA examples. Feature selection was very successful for OCC where the best feature selection method achieved an average accuracy of 95.6%, thereby being similar to 29% better than the worst method which achieved 66.9% accuracy. While the performance is comparable to TCC, which performs up to 3% better than OCC, TCC is much less affected by feature selection and its largest performance gap is similar to 13% which only occurs for two of the feature selection methodologies. We conclude that feature selection is crucially important for OCC and that it can perform on par with TCC given the proper set of features.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] Two-class pattern classification using principle component analysis
    Ahn, HJ
    Park, RH
    MACHINE VISION APPLICATIONS IN INDUSTRIAL INSPECTION X, 2002, 4664 : 13 - 21
  • [32] Using one-class and two-class SVMs for multiclass image annotation
    Goh, KS
    Chang, EY
    Li, BT
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (10) : 1333 - 1346
  • [33] Comparison of one-class SVM and two-class SVM for fold recognition
    Senf, Alexander
    Chen, Xue-wen
    Zhang, Anne
    NEURAL INFORMATION PROCESSING, PT 2, PROCEEDINGS, 2006, 4233 : 140 - 149
  • [34] Performance metric curve analysis framework to assess impact of the decision variable threshold, disease prevalence, and dataset variability in two-class classification
    Whitney, Heather M.
    Drukker, Karen
    Giger, Maryellen L.
    JOURNAL OF MEDICAL IMAGING, 2022, 9 (03)
  • [35] A new vision two-class classification method based on tensor technology
    Jiang, Hao
    Wang, Yongli
    He, Guoping
    Journal of Information and Computational Science, 2014, 11 (03): : 923 - 932
  • [36] Reliable classification of two-class cancer data using evolutionary algorithms
    Deb, K
    Reddy, AR
    BIOSYSTEMS, 2003, 72 (1-2) : 111 - 129
  • [37] One-Class Oriented Feature Selection and Classification of Heterogeneous Remote Sensing Images
    Hossain, Md. Ali
    Jia, Xiuping
    Benediktsson, Jon Atli
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2016, 9 (04) : 1606 - 1612
  • [38] A sub-concept-based feature selection method for one-class classification
    Liu, Zhen
    Japkowicz, Nathalie
    Wang, Ruoyu
    Liu, Li
    SOFT COMPUTING, 2020, 24 (10) : 7047 - 7062
  • [39] A sub-concept-based feature selection method for one-class classification
    Zhen Liu
    Nathalie Japkowicz
    Ruoyu Wang
    Li Liu
    Soft Computing, 2020, 24 : 7047 - 7062
  • [40] A fast fixed-point algorithm for two-class discriminative feature extraction
    Yang, Zhirong
    Laaksonen, Jorma
    ARTIFICIAL NEURAL NETWORKS - ICANN 2006, PT 2, 2006, 4132 : 330 - 339