Positive-unlabeled learning in bioinformatics and computational biology: a brief review

被引:41
|
作者
Li, Fuyi [1 ]
Dong, Shuangyu [2 ]
Leier, Andre [3 ,4 ,5 ]
Han, Meiya [6 ]
Guo, Xudong
Xu, Jing [6 ,7 ]
Wang, Xiaoyu [6 ,7 ]
Pan, Shirui [8 ,9 ]
Jia, Cangzhi [10 ]
Zhang, Yang [11 ]
Webb, Geoffrey, I [12 ,13 ]
Coin, Lachlan J. M. [14 ,15 ]
Li, Chen [6 ,7 ]
Song, Jiangning [16 ,17 ]
机构
[1] Univ Melbourne, Peter Doherty Inst Infect & Immun, Melbourne, Vic, Australia
[2] Univ Melbourne, Dept Elect & Elect Engn, Melbourne, Vic, Australia
[3] UAB Sch Med, Dept Genet, Birmingham, AL USA
[4] UABs ONeal Comprehens Canc Ctr, Birmingham, AL USA
[5] Gregory Fleming James Cyst Fibrosis Res Ctr, Birmingham, AL USA
[6] Monash Univ, Dept Biochem & Mol Biol, Melbourne, Vic 3800, Australia
[7] Monash Univ, Biomed Discovery Inst, Melbourne, Vic, Australia
[8] Monash Univ, Fac Informat Technol, Melbourne, Vic, Australia
[9] Univ Technol Sydney, Sch Software, Sydney, NSW, Australia
[10] Dalian Maritime Univ, Coll Sci, Dalian, Peoples R China
[11] Northwest A&F Univ, Coll Informat Engn, Yangling, Shaanxi, Peoples R China
[12] Monash Univ, Monash Data Futures Inst, Melbourne, Vic, Australia
[13] Monash Univ, Fac Informat Technol, Melbourne, Vic, Australia
[14] Univ Melbourne, Dept Microbiol & Immunol, Melbourne, Vic, Australia
[15] Univ Melbourne, Dept Clin Pathol, Melbourne, Vic, Australia
[16] Monash Univ, Monash Biomed Discovery Inst, Melbourne, Vic, Australia
[17] Monash Univ, Fac Informat Technol, Monash Ctr Data Sci, Melbourne, Vic, Australia
基金
英国医学研究理事会; 澳大利亚研究理事会; 澳大利亚国家健康与医学研究理事会; 美国国家卫生研究院;
关键词
positive unlabeled learning; semi-supervised learning; machine learning; bioinformatics; pattern recognition; PROTEIN FUNCTION; PREDICTION; INTEGRATION; SEQUENCE; SITES; PROMOTERS; NETWORKS;
D O I
10.1093/bib/bbab461
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Conventional supervised binary classification algorithms have been widely applied to address significant research questions using biological and biomedical data. This classification scheme requires two fully labeled classes of data (e.g. positive and negative samples) to train a classification model. However, in many bioinformatics applications, labeling data is laborious, and the negative samples might be potentially mislabeled due to the limited sensitivity of the experimental equipment. The positive unlabeled (PU) learning scheme was therefore proposed to enable the classifier to learn directly from limited positive samples and a large number of unlabeled samples (i.e. a mixture of positive or negative samples). To date, several PU learning algorithms have been developed to address various biological questions, such as sequence identification, functional site characterization and interaction prediction. In this paper, we revisit a collection of 29 state-of-the-art PU learning bioinformatic applications to address various biological questions. Various important aspects are extensively discussed, including PU learning methodology, biological application, classifier design and evaluation strategy. We also comment on the existing issues of PU learning and offer our perspectives for the future development of PU learning applications. We anticipate that our work serves as an instrumental guideline for a better understanding of the PU learning framework in bioinformatics and further developing next-generation PU learning frameworks for critical biological applications.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] Unsupervised Body Hair Detection by Positive-Unlabeled Learning in Photoacoustic Image
    Kikkawa, Ryo
    Kajita, Hiroki
    Imanishi, Nobuaki
    Aiso, Sadakazu
    Bise, Ryoma
    2021 43RD ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY (EMBC), 2021, : 3349 - 3352
  • [42] EPuL: An Enhanced Positive-Unlabeled Learning Algorithm for the Prediction of Pupylation Sites
    Nan, Xuanguo
    Bao, Lingling
    Zhao, Xiaosa
    Zhao, Xiaowei
    Sangaiah, Arun Kumar
    Wang, Gai-Ge
    Ma, Zhiqiang
    MOLECULES, 2017, 22 (09):
  • [43] Positive-unlabeled learning for the prediction of conformational B-cell epitopes
    Jing Ren
    Qian Liu
    John Ellis
    Jinyan Li
    BMC Bioinformatics, 16
  • [44] Positive-Unlabeled Learning for inferring drug interactions based on heterogeneous attributes
    Hameed, Pathima Nusrath
    Verspoor, Karin
    Kusljic, Snezana
    Halgamuge, Saman
    BMC BIOINFORMATICS, 2017, 18
  • [45] Entropy Weight Allocation: Positive-unlabeled Learning via Optimal Transport
    Gu, Wen
    Zhang, Teng
    Jin, Hai
    PROCEEDINGS OF THE 2022 SIAM INTERNATIONAL CONFERENCE ON DATA MINING, SDM, 2022, : 37 - 45
  • [46] An Integrated Framework of Positive-Unlabeled and Imbalanced Learning for Landslide Susceptibility Mapping
    Fu, Zijin
    Ma, Hao
    Wang, Fawu
    Dou, Jie
    Zhang, Bo
    Fang, Zhice
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 15596 - 15611
  • [47] Positive-Unlabeled Learning for inferring drug interactions based on heterogeneous attributes
    Pathima Nusrath Hameed
    Karin Verspoor
    Snezana Kusljic
    Saman Halgamuge
    BMC Bioinformatics, 18
  • [48] Intrusion Detection based on Non-negative Positive-unlabeled Learning
    Lv, Sicai
    Liu, Yang
    Liu, Zhiyao
    Chao, Wang
    Wu, Chenrui
    Wang, Bailing
    PROCEEDINGS OF 2020 IEEE 9TH DATA DRIVEN CONTROL AND LEARNING SYSTEMS CONFERENCE (DDCLS'20), 2020, : 1015 - 1020
  • [49] Semi-supervised AUC optimization based on positive-unlabeled learning
    Sakai, Tomoya
    Niu, Gang
    Sugiyama, Masashi
    MACHINE LEARNING, 2018, 107 (04) : 767 - 794
  • [50] A multi-objective evolutionary algorithm for robust positive-unlabeled learning
    Qiu, Jianfeng
    Tang, Qi
    Tan, Ming
    Li, Kaixuan
    Xie, Juan
    Cai, Xiaoqiang
    Cheng, Fan
    INFORMATION SCIENCES, 2024, 678