Protein classification with imbalanced data

被引:100
|
作者
Zhao, Xing-Ming [1 ,2 ,3 ]
Li, Xin [4 ]
Chen, Luonan [3 ,5 ,6 ]
Aihara, Kazuyuki [1 ,3 ]
机构
[1] JST, ERATO, Aihara Complex Modelling Projects, Tokyo 1510064, Japan
[2] Chinese Acad Sci, Hefei Inst Intelligent Machines, Intelligent Comp Lab, Hefei 230031, Anhui, Peoples R China
[3] Univ Tokyo, Inst Ind Sci, Tokyo 1538505, Japan
[4] Hong Kong Baptist Univ, Dept Comp Sci, Hong Kong, Hong Kong, Peoples R China
[5] Osaka Sangyo Univ, Dept Elect & Elect Engn, Osaka 5748530, Japan
[6] Shanghai Univ, Inst Syst Biol, Shanghai 200444, Peoples R China
关键词
D O I
10.1002/prot.21870
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Generally, protein classification is a multi-class classification problem and can be reduced to a set of binary classification problems, where one classifier is designed for each class. The proteins in one class are seen as positive examples while those outside the class are seen as negative examples. However, the imbalanced problem will arise in this case because the number of proteins in one class is usually much smaller than that of the proteins outside the class. As a result, the imbalanced data cause classifiers to tend to overfit and to perform poorly in particular on the minority class. This article presents a new technique for protein classification with imbalanced data. First, we propose a new algorithm to overcome the imbalanced problem in protein classification with a new sampling technique and a committee of classifiers. Then, classifiers trained in different feature spaces are combined together to further improve the accuracy of protein classification. The numerical experiments on benchmark datasets show promising results, which confirms the effectiveness of the proposed method in terms of accuracy. The Matlab code and supplementary materials are available at http:// server2.sat. iis.u-tokyo.ac.jpl-xmzhaolproteins.html.
引用
收藏
页码:1125 / 1132
页数:8
相关论文
共 50 条
  • [41] Classification of Imbalanced Data in E-Commerce
    McLean, Liliya Besaleva
    Weaver, Alfred C.
    PROCEEDINGS OF THE 2017 INTELLIGENT SYSTEMS CONFERENCE (INTELLISYS), 2017, : 744 - 750
  • [42] ACTIVE SMOTE for Imbalanced Medical Data Classification
    Sena, Raul
    Ben Hamida, Sana
    ADVANCES IN INFORMATION SYSTEMS, ARTIFICIAL INTELLIGENCE AND KNOWLEDGE MANAGEMENT, ICIKS 2023, 2024, 486 : 81 - 97
  • [43] Dynamic Curriculum Learning for Imbalanced Data Classification
    Wang, Yiru
    Gan, Weihao
    Yang, Jie
    Wu, Wei
    Yan, Junjie
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 5016 - 5025
  • [44] Local neighborhood encodings for imbalanced data classification
    Koziarski, Michal
    Wozniak, Michal
    MACHINE LEARNING, 2024, 113 (10) : 7421 - 7449
  • [45] The improved AdaBoost algorithms for imbalanced data classification
    Wang, Wenyang
    Sun, Dongchu
    INFORMATION SCIENCES, 2021, 563 : 358 - 374
  • [46] An Evolutionary Sampling Approach for Classification with Imbalanced Data
    Fernandes, Everlandio R. Q.
    de Carvalho, Andre C. P. L. F.
    Coelho, Andre L. V.
    2015 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2015,
  • [47] A review of boosting methods for imbalanced data classification
    Li, Qiujie
    Mao, Yaobin
    PATTERN ANALYSIS AND APPLICATIONS, 2014, 17 (04) : 679 - 693
  • [48] Training and assessing classification rules with imbalanced data
    Menardi, Giovanna
    Torelli, Nicola
    DATA MINING AND KNOWLEDGE DISCOVERY, 2014, 28 (01) : 92 - 122
  • [49] Leveraging ensemble pruning for imbalanced data classification
    Krawczyk, Bartosz
    Wozniak, Michal
    2018 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2018, : 439 - 444
  • [50] An automated approach for binary classification on imbalanced data
    Pedro Marques Vieira
    Fátima Rodrigues
    Knowledge and Information Systems, 2024, 66 : 2747 - 2767