Imbalanced Text Classification on Host Pathogen Protein-Protein Interaction Documents

被引:3
|
作者
Xu, Guixian [1 ,2 ]
Niu, Zhendong [2 ]
Gao, Xu [4 ]
Liu, Hongfang [3 ]
机构
[1] Minzu Univ, Coll Informat Engn, Beijing, Peoples R China
[2] Beijing Inst Technol, Coll Comp Sci, Beijing, Peoples R China
[3] Georgetown Univ, Med Ctr, Dept Bio3, Washington, DC 20007 USA
[4] North China Grid Co Ltd, Beijing, Peoples R China
来源
2010 2ND INTERNATIONAL CONFERENCE ON COMPUTER AND AUTOMATION ENGINEERING (ICCAE 2010), VOL 1 | 2010年
基金
美国国家科学基金会;
关键词
imbalanced text classification; machine learning; protein-protein interaction;
D O I
10.1109/ICCAE.2010.5451921
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
important in understanding the fundamental processes governing cell biology. However, a large number of scientific findings about PPIs are buried in the growing volume of biomedical literature. Document classification systems have been shown to have the potential to accelerate the curation process by retrieving PPI-related documents. However, it is usually a case that a small number of positive documents can be obtained manually or from PPI knowledge bases with literature-based evidence and there are a large number of negative documents. In this paper, we investigate the effects of feature selection and feature weighting as well as kernel function of Support Vector Machines (SVMs) on imbalanced two-class classification based on 1360 host-pathogen protein-protein interactions documents. The results show that the suitable feature weighting approach is the important factor for improving the classification performance. Adjusting cost sensitive parameter of radial basis function (RBF) kernel of SVM can decrease the minority class misclassification ratio and increase the classification accuracy on imbalanced documents classification. An automated classification system to identify MEDLINE abstracts referring to host-pathogen protein-protein interactions can been developed based on the experiment.
引用
收藏
页码:418 / 422
页数:5
相关论文
共 50 条
  • [41] An Empirical Investigation of Discretization Techniques on the Classification of Protein-Protein Interaction
    Sisodia, Dilip Singh
    Singh, Maheep
    MACHINE INTELLIGENCE AND SIGNAL ANALYSIS, 2019, 748 : 509 - 521
  • [42] Protein-protein interaction
    Nestor, NB
    Karam, GA
    LC GC NORTH AMERICA, 2005, : 32 - 32
  • [43] PROTEIN-PROTEIN INTERACTION
    JURNAK, F
    NATURE, 1994, 372 (6505) : 409 - 410
  • [44] In Silico Elucidation of Protein-Protein Interaction Network in Fish Pathogen Flavobacterium columnare
    Nematiasgarabad, Pershia
    Hashim, Nikman Adli Nor
    Yahya, Mohd Fakharul Zaman Raja
    MALAYSIAN APPLIED BIOLOGY, 2024, 53 (03) : 137 - 146
  • [45] Collaborative Data Analytics towards Prediction on Pathogen-Host Protein-Protein Interactions
    Chen, Huaming
    Shen, Jun
    Wang, Lei
    Song, Jiangning
    2017 IEEE 21ST INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN (CSCWD), 2017, : 269 - 274
  • [46] Leveraging Stacked Denoising Autoencoder in Prediction of Pathogen-Host Protein-Protein Interactions
    Chen, Huaming
    Shen, Jun
    Wang, Lei
    Song, Jiangning
    2017 IEEE 6TH INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS 2017), 2017, : 368 - 375
  • [47] Uncovering New Pathogen-Host Protein-Protein Interactions by Pairwise Structure Similarity
    Cui, Tao
    Li, Weihui
    Liu, Lei
    Huang, Qiaoyun
    He, Zheng-Guo
    PLOS ONE, 2016, 11 (01):
  • [48] STRUCTURAL MODELS FOR HOST-PATHOGEN PROTEIN-PROTEIN INTERACTIONS: ASSESSING COVERAGE AND BIAS
    Franzosa, Eric A.
    Xia, Yu
    PACIFIC SYMPOSIUM ON BIOCOMPUTING 2012, 2012, : 287 - 298
  • [49] ChikvInt: a Chikungunya virus-host protein-protein interaction database
    Kusari, M.
    Dey, L.
    Mukhopadhyay, A.
    LETTERS IN APPLIED MICROBIOLOGY, 2022, 74 (06) : 992 - 1000
  • [50] Pathogen-driven cancers from a structural perspective: Targeting host-pathogen protein-protein interactions
    Ozdemir, Emine Sila
    Nussinov, Ruth
    FRONTIERS IN ONCOLOGY, 2023, 13