The impact of feature selection on one and two-class classification performance for plant microRNAs

被引:12
|
作者
Khalifa, Waleed [1 ,2 ]
Yousef, Malik [1 ,2 ]
Demirci, Muserref Duygu Sacar [3 ]
Allmer, Jens [3 ,4 ]
机构
[1] Coll Sakhnin, Comp Sci, Sakhnin, Israel
[2] Galilee Soc, Inst Appl Res, Shefa Amr, Israel
[3] Izmir Inst Technol, Mol Biol & Genet, Izmir, Turkey
[4] Bionia Inc, IZTEKGEB, Izmir, Turkey
来源
PEERJ | 2016年 / 4卷
关键词
MicroRNA; Machine learning; Feature selection; Plant; One-class classification; Two-class classification; PREDICTION; SVM; MIRBASE;
D O I
10.7717/peerj.2135
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
MicroRNAs (miRNAs) are short nucleotide sequences that form a typical hairpin structure which is recognized by a complex enzyme machinery. It ultimately leads to the incorporation of 18-24 nt long, mature miRNAs into RISC where they act as recognition keys to aid in regulation of target mRNAs. It is involved to determine miRNAs experimentally and, therefore machine learning is used to complement such endeavors. The success of machine learning mostly depends on proper input data and appropriate features for parameterization of the data. Although, in general, two-class classification (TCC) is used in the field; because negative examples are hard to come by, one-class classification (OCC) has been tried for pre-miRNA detection. Since both positive and negative examples are currently somewhat limited, feature selection can prove to be vital for furthering the field of pre-miRNA detection. In this study, we compare the performance of OCC and TCC using eight feature selection methods and seven different plant species providing positive pre-miRNA examples. Feature selection was very successful for OCC where the best feature selection method achieved an average accuracy of 95.6%, thereby being similar to 29% better than the worst method which achieved 66.9% accuracy. While the performance is comparable to TCC, which performs up to 3% better than OCC, TCC is much less affected by feature selection and its largest performance gap is similar to 13% which only occurs for two of the feature selection methodologies. We conclude that feature selection is crucially important for OCC and that it can perform on par with TCC given the proper set of features.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] A New Feature Selection Method for One-Class Classification Problems
    Jeong, Young-Seon
    Kang, In-Ho
    Jeong, Myong-Kee
    Kong, Dongjoon
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2012, 42 (06): : 1500 - 1509
  • [22] Performance of miniload systems with two-class storage
    Park, BC
    Foley, RD
    Frazelle, EH
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2006, 170 (01) : 144 - 155
  • [23] From one-class to two-class classification by incorporating expert knowledge: Novelty detection in human behaviour
    Oosterlinck, Dieter
    Benoit, Dries F.
    Baecke, Philippe
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2020, 282 (03) : 1011 - 1024
  • [24] Robust classification of imbalanced data using one-class and two-class SVM-based multiclassifiers
    Maldonado, Sebastian
    Montecinos, Claudio
    INTELLIGENT DATA ANALYSIS, 2014, 18 (01) : 95 - 112
  • [25] Two-phase optimization for support vectors and parameter selection of support vector machines: Two-class classification
    Wu, Shinq-Jen
    Van-Hung Pham
    Thi-Nga Nguyen
    APPLIED SOFT COMPUTING, 2017, 59 : 129 - 142
  • [26] A CLASS SPECIFIC FEATURE SELECTION METHOD FOR IMPROVING THE PERFORMANCE OF TEXT CLASSIFICATION
    Venkatesh, V.
    Sharan, S. B.
    Mahalaxmy, S.
    Monisha, S.
    Sanjey, Ashick D. S.
    Ashokkumar, P.
    SCALABLE COMPUTING-PRACTICE AND EXPERIENCE, 2024, 25 (02): : 1018 - 1028
  • [27] Linear regression and two-class classification with gene expression data
    Huang, XH
    Pan, W
    BIOINFORMATICS, 2003, 19 (16) : 2072 - 2078
  • [28] Stratified Normalization LogitBoost for Two-Class Unbalanced Data Classification
    Song, Jie
    Lu, Xiaoling
    Liu, Miao
    Wu, Xizhi
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2011, 40 (10) : 1587 - 1593
  • [29] Sparse Detector Imaging Sensor with Two-Class Silhouette Classification
    Russomanno, David
    Chari, Srikant
    Halford, Carl
    SENSORS, 2008, 8 (12): : 7996 - 8015
  • [30] One Class Genetic-Based Feature Selection for Classification in Large Datasets
    Alkubabji, Murad
    Aldasht, Mohammed
    Adi, Safa
    BIG DATA, CLOUD AND APPLICATIONS, BDCA 2018, 2018, 872 : 301 - 311