Protein classification with imbalanced data

被引:100
|
作者
Zhao, Xing-Ming [1 ,2 ,3 ]
Li, Xin [4 ]
Chen, Luonan [3 ,5 ,6 ]
Aihara, Kazuyuki [1 ,3 ]
机构
[1] JST, ERATO, Aihara Complex Modelling Projects, Tokyo 1510064, Japan
[2] Chinese Acad Sci, Hefei Inst Intelligent Machines, Intelligent Comp Lab, Hefei 230031, Anhui, Peoples R China
[3] Univ Tokyo, Inst Ind Sci, Tokyo 1538505, Japan
[4] Hong Kong Baptist Univ, Dept Comp Sci, Hong Kong, Hong Kong, Peoples R China
[5] Osaka Sangyo Univ, Dept Elect & Elect Engn, Osaka 5748530, Japan
[6] Shanghai Univ, Inst Syst Biol, Shanghai 200444, Peoples R China
关键词
D O I
10.1002/prot.21870
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Generally, protein classification is a multi-class classification problem and can be reduced to a set of binary classification problems, where one classifier is designed for each class. The proteins in one class are seen as positive examples while those outside the class are seen as negative examples. However, the imbalanced problem will arise in this case because the number of proteins in one class is usually much smaller than that of the proteins outside the class. As a result, the imbalanced data cause classifiers to tend to overfit and to perform poorly in particular on the minority class. This article presents a new technique for protein classification with imbalanced data. First, we propose a new algorithm to overcome the imbalanced problem in protein classification with a new sampling technique and a committee of classifiers. Then, classifiers trained in different feature spaces are combined together to further improve the accuracy of protein classification. The numerical experiments on benchmark datasets show promising results, which confirms the effectiveness of the proposed method in terms of accuracy. The Matlab code and supplementary materials are available at http:// server2.sat. iis.u-tokyo.ac.jpl-xmzhaolproteins.html.
引用
收藏
页码:1125 / 1132
页数:8
相关论文
共 50 条
  • [31] Training and Testing Cascades for Imbalanced Data Classification
    Sadreddin, Armin
    Sadaoui, Samira
    2022 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2022, : 261 - 268
  • [32] Dynamic Ensemble Framework for Imbalanced Data Classification
    Zhu, Tuanfei
    Hu, Xingchen
    Liu, Xinwang
    Zhu, En
    Zhu, Xinzhong
    Xu, Huiying
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2025, 37 (05) : 2456 - 2471
  • [33] Imbalanced Data Classification Method Based on LSSASMOTE
    Wang, Zhi
    Liu, Qicheng
    IEEE ACCESS, 2023, 11 : 32252 - 32260
  • [34] An "Outside the Box" Solution for Imbalanced Data Classification
    Jegierski, Hubert
    Saganowski, Stanislaw
    IEEE ACCESS, 2020, 8 (08): : 125191 - 125209
  • [35] Imbalanced classification by learning hidden data structure
    Zhao, Yang
    Shrivastava, Abhishek K.
    Tsui, Kwok Leung
    IIE TRANSACTIONS, 2016, 48 (07) : 614 - 628
  • [36] Imbalanced data classification using MapReduce and relief
    Jedrzejowicz, Joanna
    Kostrzewski, Robert
    Neumann, Jakub
    Zakrzewska, Magdalena
    JOURNAL OF INFORMATION AND TELECOMMUNICATION, 2018, 2 (02) : 217 - 230
  • [37] Classification of weld flaws with imbalanced class data
    Liao, T. Warren
    EXPERT SYSTEMS WITH APPLICATIONS, 2008, 35 (03) : 1041 - 1052
  • [38] Software quality classification with imbalanced and noisy data
    Folleco, Andres
    Khoshgoftaar, Taghi M.
    Van Hulse, Jason
    THIRTEENTH ISSAT INTERNATIONAL CONFERENCE ON RELIABILITY AND QUALITY IN DESIGN, PROCEEDINGS, 2007, : 191 - +
  • [39] Stochastic configuration networks for imbalanced data classification
    Dai, Wei
    Ning, Chuanfeng
    Nan, Jing
    Wang, Dianhui
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2022, 13 (10) : 2843 - 2855
  • [40] Training and assessing classification rules with imbalanced data
    Giovanna Menardi
    Nicola Torelli
    Data Mining and Knowledge Discovery, 2014, 28 : 92 - 122