Robust Positive-Unlabeled Learning via Noise Negative Sample Self-correction

被引:2
|
作者
Zhu, Zhangchi [1 ,2 ]
Wang, Lu [2 ]
Zhao, Pu [2 ]
Du, Chao [2 ]
Zhang, Wei [1 ]
Dong, Hang [2 ]
Qiao, Bo [2 ]
Lin, Qingwei [2 ]
Rajmohan, Saravan [3 ]
Zhang, Dongmei [2 ]
机构
[1] East China Normal Univ, Shanghai, Peoples R China
[2] Microsoft Res, Beijing, Peoples R China
[3] Microsoft 365, Seattle, WA USA
基金
中国国家自然科学基金;
关键词
positive-unlabeled learning; curriculum learning;
D O I
10.1145/3580305.3599491
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Learning from positive and unlabeled data is known as positive-unlabeled (PU) learning in literature and has attracted much attention in recent years. One common approach in PU learning is to sample a set of pseudo-negatives from the unlabeled data using ad-hoc thresholds so that conventional supervised methods can be applied with both positive and negative samples. Owing to the label uncertainty among the unlabeled data, errors of misclassifying unlabeled positive samples as negative samples inevitably appear and may even accumulate during the training processes. Those errors often lead to performance degradation and model instability. To mitigate the impact of label uncertainty and improve the robustness of learning with positive and unlabeled data, we propose a new robust PU learning method with a training strategy motivated by the nature of human learning: easy cases should be learned first. Similar intuition has been utilized in curriculum learning to only use easier cases in the early stage of training before introducing more complex cases. Specifically, we utilize a novel "hardness" measure to distinguish unlabeled samples with a high chance of being negative from unlabeled samples with large label noise. An iterative training strategy is then implemented to fine-tune the selection of negative samples during the training process in an iterative manner to include more "easy" samples in the early stage of training. Extensive experimental validations over a wide range of learning tasks show that this approach can effectively improve the accuracy and stability of learning with positive and unlabeled data. Our code is available at https://github.com/woriazzc/Robust-PU.
引用
收藏
页码:3663 / 3673
页数:11
相关论文
共 50 条
  • [31] Partial Optimal Transport with Applications on Positive-Unlabeled Learning
    Chapel, Laetitia
    Alaya, Mokhtar Z.
    Gasso, Gilles
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [32] Recovering True Classifier Performance in Positive-Unlabeled Learning
    Jain, Shantanu
    White, Martha
    Radivojac, Predrag
    THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 2066 - 2072
  • [33] Spotting Fake Reviews using Positive-Unlabeled Learning
    Li, Huayi
    Liu, Bing
    Mukherjee, Arjun
    Shao, Jidong
    COMPUTACION Y SISTEMAS, 2014, 18 (03): : 467 - 475
  • [34] Investigating Active Positive-Unlabeled Learning with Deep Networks
    Han, Kun
    Chen, Weitong
    Xu, Miao
    AI 2021: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, 13151 : 607 - 618
  • [35] Bootstrap Latent Prototypes for graph positive-unlabeled learning
    Liang, Chunquan
    Tian, Yi
    Zhao, Dongmin
    Li, Mei
    Pan, Shirui
    Zhang, Hongming
    Wei, Jicheng
    INFORMATION FUSION, 2024, 112
  • [36] Noisy Positive-Unlabeled Learning with Self-Training for Speculative Knowledge Graph Reasoning
    Wang, Ruijie
    Li, Baoyu
    Lu, Yichen
    Sun, Dachun
    Li, Jinning
    Yan, Yuchen
    Liu, Shengzhong
    Tong, Hanghang
    Abdelzaher, Tarek F.
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 2440 - 2457
  • [37] Positive-Unlabeled Learning using Random Forests via Recursive Greedy Risk Minimization
    Wilton, Jonathan
    Koay, Abigail M. Y.
    Ko, Ryan K. L.
    Xu, Miao
    Ye, Nan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [38] Scalable Evaluation and Improvement of Document Set Expansion via Neural Positive-Unlabeled Learning
    Jacovi, Alon
    Niu, Gang
    Goldberg, Yoav
    Sugiyama, Masashi
    16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 581 - 592
  • [39] Principled analytic classifier for positive-unlabeled learning via weighted integral probability metric
    Yongchan Kwon
    Wonyoung Kim
    Masashi Sugiyama
    Myunghee Cho Paik
    Machine Learning, 2020, 109 : 513 - 532
  • [40] Automatic noise reduction of domain-specific bibliographic datasets using positive-unlabeled learning
    Chen, Guo
    Chen, Jing
    Shao, Yu
    Xiao, Lu
    SCIENTOMETRICS, 2023, 128 (02) : 1187 - 1204