Positive-unlabeled learning in bioinformatics and computational biology: a brief review

被引:41
|
作者
Li, Fuyi [1 ]
Dong, Shuangyu [2 ]
Leier, Andre [3 ,4 ,5 ]
Han, Meiya [6 ]
Guo, Xudong
Xu, Jing [6 ,7 ]
Wang, Xiaoyu [6 ,7 ]
Pan, Shirui [8 ,9 ]
Jia, Cangzhi [10 ]
Zhang, Yang [11 ]
Webb, Geoffrey, I [12 ,13 ]
Coin, Lachlan J. M. [14 ,15 ]
Li, Chen [6 ,7 ]
Song, Jiangning [16 ,17 ]
机构
[1] Univ Melbourne, Peter Doherty Inst Infect & Immun, Melbourne, Vic, Australia
[2] Univ Melbourne, Dept Elect & Elect Engn, Melbourne, Vic, Australia
[3] UAB Sch Med, Dept Genet, Birmingham, AL USA
[4] UABs ONeal Comprehens Canc Ctr, Birmingham, AL USA
[5] Gregory Fleming James Cyst Fibrosis Res Ctr, Birmingham, AL USA
[6] Monash Univ, Dept Biochem & Mol Biol, Melbourne, Vic 3800, Australia
[7] Monash Univ, Biomed Discovery Inst, Melbourne, Vic, Australia
[8] Monash Univ, Fac Informat Technol, Melbourne, Vic, Australia
[9] Univ Technol Sydney, Sch Software, Sydney, NSW, Australia
[10] Dalian Maritime Univ, Coll Sci, Dalian, Peoples R China
[11] Northwest A&F Univ, Coll Informat Engn, Yangling, Shaanxi, Peoples R China
[12] Monash Univ, Monash Data Futures Inst, Melbourne, Vic, Australia
[13] Monash Univ, Fac Informat Technol, Melbourne, Vic, Australia
[14] Univ Melbourne, Dept Microbiol & Immunol, Melbourne, Vic, Australia
[15] Univ Melbourne, Dept Clin Pathol, Melbourne, Vic, Australia
[16] Monash Univ, Monash Biomed Discovery Inst, Melbourne, Vic, Australia
[17] Monash Univ, Fac Informat Technol, Monash Ctr Data Sci, Melbourne, Vic, Australia
基金
英国医学研究理事会; 澳大利亚研究理事会; 澳大利亚国家健康与医学研究理事会; 美国国家卫生研究院;
关键词
positive unlabeled learning; semi-supervised learning; machine learning; bioinformatics; pattern recognition; PROTEIN FUNCTION; PREDICTION; INTEGRATION; SEQUENCE; SITES; PROMOTERS; NETWORKS;
D O I
10.1093/bib/bbab461
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Conventional supervised binary classification algorithms have been widely applied to address significant research questions using biological and biomedical data. This classification scheme requires two fully labeled classes of data (e.g. positive and negative samples) to train a classification model. However, in many bioinformatics applications, labeling data is laborious, and the negative samples might be potentially mislabeled due to the limited sensitivity of the experimental equipment. The positive unlabeled (PU) learning scheme was therefore proposed to enable the classifier to learn directly from limited positive samples and a large number of unlabeled samples (i.e. a mixture of positive or negative samples). To date, several PU learning algorithms have been developed to address various biological questions, such as sequence identification, functional site characterization and interaction prediction. In this paper, we revisit a collection of 29 state-of-the-art PU learning bioinformatic applications to address various biological questions. Various important aspects are extensively discussed, including PU learning methodology, biological application, classifier design and evaluation strategy. We also comment on the existing issues of PU learning and offer our perspectives for the future development of PU learning applications. We anticipate that our work serves as an instrumental guideline for a better understanding of the PU learning framework in bioinformatics and further developing next-generation PU learning frameworks for critical biological applications.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] Recovering True Classifier Performance in Positive-Unlabeled Learning
    Jain, Shantanu
    White, Martha
    Radivojac, Predrag
    THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 2066 - 2072
  • [22] Spotting Fake Reviews using Positive-Unlabeled Learning
    Li, Huayi
    Liu, Bing
    Mukherjee, Arjun
    Shao, Jidong
    COMPUTACION Y SISTEMAS, 2014, 18 (03): : 467 - 475
  • [23] Theoretical Comparisons of Positive-Unlabeled Learning against Positive-Negative Learning
    Niu, Gang
    du Plessis, Marthinus C.
    Sakai, Tomoya
    Ma, Yao
    Sugiyama, Masashi
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [24] Investigating Active Positive-Unlabeled Learning with Deep Networks
    Han, Kun
    Chen, Weitong
    Xu, Miao
    AI 2021: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, 13151 : 607 - 618
  • [25] Bootstrap Latent Prototypes for graph positive-unlabeled learning
    Liang, Chunquan
    Tian, Yi
    Zhao, Dongmin
    Li, Mei
    Pan, Shirui
    Zhang, Hongming
    Wei, Jicheng
    INFORMATION FUSION, 2024, 112
  • [26] Positive-Unlabeled Compression on the Cloud
    Xu, Yixing
    Wang, Yunhe
    Chen, Hanting
    Han, Kai
    Xu, Chunjing
    Tao, Dacheng
    Xu, Chang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [27] Positive-Unlabeled Domain Adaptation
    Sonntag, Jonas
    Behrens, Gunnar
    Schmidt-Thieme, Lars
    2022 IEEE 9TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2022, : 66 - 75
  • [28] GradPU: Positive-Unlabeled Learning via Gradient Penalty and Positive Upweighting
    Dai, Songmin
    Li, Xiaoqiang
    Zhou, Yue
    Ye, Xichen
    Liu, Tong
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 6, 2023, : 7296 - +
  • [29] PUe: Biased Positive-Unlabeled Learning Enhancement by Causal Inference
    Wang, Xutao
    Chen, Hanting
    Guo, Tianyu
    Wang, Yunhe
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [30] PULNS: Positive-Unlabeled Learning with Effective Negative Sample Selector
    Luo, Chuan
    Zhao, Pu
    Chen, Chen
    Qiao, Bo
    Du, Chao
    Zhang, Hongyu
    Wu, Wei
    Cai, Shaowei
    He, Bing
    Rajmohan, Saravanakumar
    Lin, Qingwei
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 8784 - 8792