Positive-unlabeled learning in bioinformatics and computational biology: a brief review

被引:41
|
作者
Li, Fuyi [1 ]
Dong, Shuangyu [2 ]
Leier, Andre [3 ,4 ,5 ]
Han, Meiya [6 ]
Guo, Xudong
Xu, Jing [6 ,7 ]
Wang, Xiaoyu [6 ,7 ]
Pan, Shirui [8 ,9 ]
Jia, Cangzhi [10 ]
Zhang, Yang [11 ]
Webb, Geoffrey, I [12 ,13 ]
Coin, Lachlan J. M. [14 ,15 ]
Li, Chen [6 ,7 ]
Song, Jiangning [16 ,17 ]
机构
[1] Univ Melbourne, Peter Doherty Inst Infect & Immun, Melbourne, Vic, Australia
[2] Univ Melbourne, Dept Elect & Elect Engn, Melbourne, Vic, Australia
[3] UAB Sch Med, Dept Genet, Birmingham, AL USA
[4] UABs ONeal Comprehens Canc Ctr, Birmingham, AL USA
[5] Gregory Fleming James Cyst Fibrosis Res Ctr, Birmingham, AL USA
[6] Monash Univ, Dept Biochem & Mol Biol, Melbourne, Vic 3800, Australia
[7] Monash Univ, Biomed Discovery Inst, Melbourne, Vic, Australia
[8] Monash Univ, Fac Informat Technol, Melbourne, Vic, Australia
[9] Univ Technol Sydney, Sch Software, Sydney, NSW, Australia
[10] Dalian Maritime Univ, Coll Sci, Dalian, Peoples R China
[11] Northwest A&F Univ, Coll Informat Engn, Yangling, Shaanxi, Peoples R China
[12] Monash Univ, Monash Data Futures Inst, Melbourne, Vic, Australia
[13] Monash Univ, Fac Informat Technol, Melbourne, Vic, Australia
[14] Univ Melbourne, Dept Microbiol & Immunol, Melbourne, Vic, Australia
[15] Univ Melbourne, Dept Clin Pathol, Melbourne, Vic, Australia
[16] Monash Univ, Monash Biomed Discovery Inst, Melbourne, Vic, Australia
[17] Monash Univ, Fac Informat Technol, Monash Ctr Data Sci, Melbourne, Vic, Australia
基金
英国医学研究理事会; 澳大利亚研究理事会; 澳大利亚国家健康与医学研究理事会; 美国国家卫生研究院;
关键词
positive unlabeled learning; semi-supervised learning; machine learning; bioinformatics; pattern recognition; PROTEIN FUNCTION; PREDICTION; INTEGRATION; SEQUENCE; SITES; PROMOTERS; NETWORKS;
D O I
10.1093/bib/bbab461
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Conventional supervised binary classification algorithms have been widely applied to address significant research questions using biological and biomedical data. This classification scheme requires two fully labeled classes of data (e.g. positive and negative samples) to train a classification model. However, in many bioinformatics applications, labeling data is laborious, and the negative samples might be potentially mislabeled due to the limited sensitivity of the experimental equipment. The positive unlabeled (PU) learning scheme was therefore proposed to enable the classifier to learn directly from limited positive samples and a large number of unlabeled samples (i.e. a mixture of positive or negative samples). To date, several PU learning algorithms have been developed to address various biological questions, such as sequence identification, functional site characterization and interaction prediction. In this paper, we revisit a collection of 29 state-of-the-art PU learning bioinformatic applications to address various biological questions. Various important aspects are extensively discussed, including PU learning methodology, biological application, classifier design and evaluation strategy. We also comment on the existing issues of PU learning and offer our perspectives for the future development of PU learning applications. We anticipate that our work serves as an instrumental guideline for a better understanding of the PU learning framework in bioinformatics and further developing next-generation PU learning frameworks for critical biological applications.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] AdaSampling for Positive-Unlabeled and Label Noise Learning With Bioinformatics Applications
    Yang, Pengyi
    Ormerod, John T.
    Liu, Wei
    Ma, Chendong
    Zomaya, Albert Y.
    Yang, Jean Y. H.
    IEEE TRANSACTIONS ON CYBERNETICS, 2019, 49 (05) : 1932 - 1943
  • [2] Computational Identification of Lysine Glutarylation Sites Using Positive-Unlabeled Learning
    Ju, Zhe
    Wang, Shi-Yun
    CURRENT GENOMICS, 2020, 21 (03) : 204 - 211
  • [3] Density Estimators for Positive-Unlabeled Learning
    Basile, Teresa M. A.
    Di Mauro, Nicola
    Esposito, Floriana
    Ferilli, Stefano
    Vergari, Antonio
    NEW FRONTIERS IN MINING COMPLEX PATTERNS, NFMCP 2017, 2018, 10785 : 49 - 64
  • [4] Generative Adversarial Positive-Unlabeled Learning
    Hou, Ming
    Chaib-draa, Brahim
    Li, Chao
    Zhao, Qibin
    PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 2255 - 2261
  • [5] Positive-Unlabeled Learning in Streaming Networks
    Chang, Shiyu
    Zhang, Yang
    Tang, Jiliang
    Yin, Dawei
    Chang, Yi
    Hasegawa-Johnson, Mark A.
    Huang, Thomas S.
    KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 755 - 764
  • [6] Positive-Unlabeled Learning for Knowledge Distillation
    Ning Jiang
    Jialiang Tang
    Wenxin Yu
    Neural Processing Letters, 2023, 55 : 2613 - 2631
  • [7] Positive-Unlabeled Learning for Knowledge Distillation
    Jiang, Ning
    Tang, Jialiang
    Yu, Wenxin
    NEURAL PROCESSING LETTERS, 2023, 55 (03) : 2613 - 2631
  • [8] A boosting framework for positive-unlabeled learning
    Zhao, Yawen
    Zhang, Mingzhe
    Zhang, Chenhao
    Chen, Weitong
    Ye, Nan
    Xu, Miao
    STATISTICS AND COMPUTING, 2025, 35 (01)
  • [9] Positive-Unlabeled Learning With Label Distribution Alignment
    Jiang, Yangbangyan
    Xu, Qianqian
    Zhao, Yunrui
    Yang, Zhiyong
    Wen, Peisong
    Cao, Xiaochun
    Huang, Qingming
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (12) : 15345 - 15363
  • [10] Positive-Unlabeled Learning for Network Link Prediction
    Gan, Shengfeng
    Alshahrani, Mohammed
    Liu, Shichao
    MATHEMATICS, 2022, 10 (18)