The impact of feature selection on one and two-class classification performance for plant microRNAs

被引:12
|
作者
Khalifa, Waleed [1 ,2 ]
Yousef, Malik [1 ,2 ]
Demirci, Muserref Duygu Sacar [3 ]
Allmer, Jens [3 ,4 ]
机构
[1] Coll Sakhnin, Comp Sci, Sakhnin, Israel
[2] Galilee Soc, Inst Appl Res, Shefa Amr, Israel
[3] Izmir Inst Technol, Mol Biol & Genet, Izmir, Turkey
[4] Bionia Inc, IZTEKGEB, Izmir, Turkey
来源
PEERJ | 2016年 / 4卷
关键词
MicroRNA; Machine learning; Feature selection; Plant; One-class classification; Two-class classification; PREDICTION; SVM; MIRBASE;
D O I
10.7717/peerj.2135
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
MicroRNAs (miRNAs) are short nucleotide sequences that form a typical hairpin structure which is recognized by a complex enzyme machinery. It ultimately leads to the incorporation of 18-24 nt long, mature miRNAs into RISC where they act as recognition keys to aid in regulation of target mRNAs. It is involved to determine miRNAs experimentally and, therefore machine learning is used to complement such endeavors. The success of machine learning mostly depends on proper input data and appropriate features for parameterization of the data. Although, in general, two-class classification (TCC) is used in the field; because negative examples are hard to come by, one-class classification (OCC) has been tried for pre-miRNA detection. Since both positive and negative examples are currently somewhat limited, feature selection can prove to be vital for furthering the field of pre-miRNA detection. In this study, we compare the performance of OCC and TCC using eight feature selection methods and seven different plant species providing positive pre-miRNA examples. Feature selection was very successful for OCC where the best feature selection method achieved an average accuracy of 95.6%, thereby being similar to 29% better than the worst method which achieved 66.9% accuracy. While the performance is comparable to TCC, which performs up to 3% better than OCC, TCC is much less affected by feature selection and its largest performance gap is similar to 13% which only occurs for two of the feature selection methodologies. We conclude that feature selection is crucially important for OCC and that it can perform on par with TCC given the proper set of features.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Bi-objective feature selection for discriminant analysis in two-class classification
    Pacheco, Joaquin
    Casado, Silvia
    Angel-Bello, Francisco
    Alvarez, Ada
    KNOWLEDGE-BASED SYSTEMS, 2013, 44 : 57 - 64
  • [2] An Effective Metaheuristic for Bi-objective Feature Selection in Two-Class Classification Problem
    Lyubchenko, A. A.
    Pacheco, J. A.
    Casado, S.
    Nunez, L.
    XII INTERNATIONAL SCIENTIFIC AND TECHNICAL CONFERENCE APPLIED MECHANICS AND SYSTEMS DYNAMICS, 2019, 1210
  • [3] A New Feature Selection Algorithm for Two-Class Classification Problems and Application to Endometrial Cancer
    Ahsen, M. Eren
    Singh, Nitin K.
    Boren, Todd
    Vidyasagar, M.
    White, Michael A.
    2012 IEEE 51ST ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC), 2012, : 2976 - 2982
  • [4] Two-Class Weather Classification
    Lu, Cewu
    Lin, Di
    Jia, Jiaya
    Tang, Chi-Keung
    2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 3718 - 3725
  • [5] Two-Class Weather Classification
    Lu, Cewu
    Lin, Di
    Jia, Jiaya
    Tang, Chi-Keung
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) : 2510 - 2524
  • [6] A Feature Transformation Method using Genetic Programming for Two-Class Classification
    Hiroyasu, Tomoyuki
    Shiraishi, Toshihide
    Yoshida, Tomoya
    Yamamoto, Utako
    2014 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DATA MINING (CIDM), 2014, : 234 - 240
  • [7] An experimental comparison of feature selection methods on two-class biomedical datasets
    Drotar, P.
    Gazda, J.
    Smekal, Z.
    COMPUTERS IN BIOLOGY AND MEDICINE, 2015, 66 : 1 - 10
  • [8] Feature Selection and Ensemble Learning Techniques in One-Class Classifiers: An Empirical Study of Two-Class Imbalanced Datasets
    Tsai, Chih-Fong
    Lin, Wei-Chao
    IEEE ACCESS, 2021, 9 : 13717 - 13726
  • [9] Two-Class with Oversampling Versus One-Class Classification for Microarray Datasets
    Perez-Sanchez, Beatriz
    Fontenla-Romero, Oscar
    Sanchez-Marono, Noelia
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2016, PT II, 2016, 9887 : 398 - 405
  • [10] A Feature Transformation Method using Multiobjective Genetic Programming for Two-Class Classification
    Hiroyasu, Tomoyuki
    Shiraishi, Toshihide
    Yoshida, Tomoya
    Yamamoto, Utako
    2015 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2015, : 2989 - 2995