Protein classification with imbalanced data

被引:100
|
作者
Zhao, Xing-Ming [1 ,2 ,3 ]
Li, Xin [4 ]
Chen, Luonan [3 ,5 ,6 ]
Aihara, Kazuyuki [1 ,3 ]
机构
[1] JST, ERATO, Aihara Complex Modelling Projects, Tokyo 1510064, Japan
[2] Chinese Acad Sci, Hefei Inst Intelligent Machines, Intelligent Comp Lab, Hefei 230031, Anhui, Peoples R China
[3] Univ Tokyo, Inst Ind Sci, Tokyo 1538505, Japan
[4] Hong Kong Baptist Univ, Dept Comp Sci, Hong Kong, Hong Kong, Peoples R China
[5] Osaka Sangyo Univ, Dept Elect & Elect Engn, Osaka 5748530, Japan
[6] Shanghai Univ, Inst Syst Biol, Shanghai 200444, Peoples R China
关键词
D O I
10.1002/prot.21870
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Generally, protein classification is a multi-class classification problem and can be reduced to a set of binary classification problems, where one classifier is designed for each class. The proteins in one class are seen as positive examples while those outside the class are seen as negative examples. However, the imbalanced problem will arise in this case because the number of proteins in one class is usually much smaller than that of the proteins outside the class. As a result, the imbalanced data cause classifiers to tend to overfit and to perform poorly in particular on the minority class. This article presents a new technique for protein classification with imbalanced data. First, we propose a new algorithm to overcome the imbalanced problem in protein classification with a new sampling technique and a committee of classifiers. Then, classifiers trained in different feature spaces are combined together to further improve the accuracy of protein classification. The numerical experiments on benchmark datasets show promising results, which confirms the effectiveness of the proposed method in terms of accuracy. The Matlab code and supplementary materials are available at http:// server2.sat. iis.u-tokyo.ac.jpl-xmzhaolproteins.html.
引用
收藏
页码:1125 / 1132
页数:8
相关论文
共 50 条
  • [1] Binary Classification with Imbalanced Data
    Chiang, Jyun-You
    Lio, Yuhlong
    Hsu, Chien-Ya
    Ho, Chia-Ling
    Tsai, Tzong-Ru
    ENTROPY, 2024, 26 (01)
  • [2] Framework for imbalanced data classification
    Blaszczyk, Mikolaj
    Jedrzejowicz, Joanna
    KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS (KSE 2021), 2021, 192 : 3477 - 3486
  • [3] Mine Classification With Imbalanced Data
    Williams, David P.
    Myers, Vincent
    Silvious, Miranda Schatten
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2009, 6 (03) : 528 - 532
  • [4] CLASSIFICATION OF IMBALANCED DATA: A REVIEW
    Sun, Yanmin
    Wong, Andrew K. C.
    Kamel, Mohamed S.
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2009, 23 (04) : 687 - 719
  • [5] Imbalanced Protein Data Classification Using Ensemble FTM-SVM
    Dai, Hong-Liang
    IEEE TRANSACTIONS ON NANOBIOSCIENCE, 2015, 14 (04) : 350 - 359
  • [6] Data reduction and stacking for imbalanced data classification
    Czarnowski, Ireneusz
    Jedrzejowicz, Piotr
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2019, 37 (06) : 7239 - 7249
  • [7] Classification of Imbalanced Auction Fraud Data
    Ganguly, Swati
    Sadaoui, Samira
    ADVANCES IN ARTIFICIAL INTELLIGENCE, CANADIAN AI 2017, 2017, 10233 : 84 - 89
  • [8] Imbalanced Data Classification Based on Clustering
    Li, Hu
    Zou, Peng
    Han, Weihong
    Xia, Rongze
    COMPUTER-AIDED DESIGN, MANUFACTURING, MODELING AND SIMULATION III, 2014, 443 : 741 - 745
  • [9] Adaptive Oversampling for Imbalanced Data Classification
    Ertekin, Seyda
    INFORMATION SCIENCES AND SYSTEMS 2013, 2013, 264 : 261 - 269
  • [10] Graph Classification with Imbalanced Data Sets
    Xiao, Gang-Song
    Chen, Xiao-Yun
    2011 FIRST ASIAN CONFERENCE ON PATTERN RECOGNITION (ACPR), 2011, : 57 - 61