Learning from data streams with only positive and unlabeled data

被引:13
|
作者
Qin, Xiangju [1 ]
Zhang, Yang [1 ,2 ]
Li, Chen [1 ]
Li, Xue [3 ]
机构
[1] Northwest A&F Univ, Coll Informat Engn, Yangling, Peoples R China
[2] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing 210008, Jiangsu, Peoples R China
[3] Univ Queensland, Sch Informat Technol & Elect Engn, Brisbane, Qld, Australia
基金
中国国家自然科学基金;
关键词
Positive and unlabeled learning; Data stream classification; Incremental learning; Functional leaves; DECISION TREES; CLASSIFICATION;
D O I
10.1007/s10844-012-0231-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many studies on streaming data classification have been based on a paradigm in which a fully labeled stream is available for learning purposes. However, it is often too labor-intensive and time-consuming to manually label a data stream for training. This difficulty may cause conventional supervised learning approaches to be infeasible in many real world applications, such as credit fraud detection, intrusion detection, and rare event prediction. In previous work, Li et al. suggested that these applications be treated as Positive and Unlabeled learning problem, and proposed a learning algorithm, OcVFD, as a solution (Li et al. 2009). Their method requires only a set of positive examples and a set of unlabeled examples which is easily obtainable in a streaming environment, making it widely applicable to real-life applications. Here, we enhance Li et al.'s solution by adding three features: an efficient method to estimate the percentage of positive examples in the training stream, the ability to handle numeric attributes, and the use of more appropriate classification methods at tree leaves. Experimental results on synthetic and real-life datasets show that our enhanced solution (called PUVFDT) has very good classification performance and a strong ability to learn from data streams with only positive and unlabeled examples. Furthermore, our enhanced solution reduces the learning time of OcVFDT by about an order of magnitude. Even with 80 % of the examples in the training data stream unlabeled, PUVFDT can still achieve a competitive classification performance compared with that of VFDTcNB (Gama et al. 2003), a supervised learning algorithm.
引用
收藏
页码:405 / 430
页数:26
相关论文
共 50 条
  • [21] Learning from positive and unlabeled examples with different data distributions
    Li, XL
    Liu, B
    MACHINE LEARNING: ECML 2005, PROCEEDINGS, 2005, 3720 : 218 - 229
  • [22] fBGD: Learning Embeddings From Positive Unlabeled Data with BGD
    Yuan, Fajie
    Xin, Xin
    He, Xiangnan
    Guo, Guibing
    Zhang, Weinan
    Chua, Tat-Seng
    Jose, Joemon M.
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2018, : 198 - 207
  • [23] Class-prior estimation for learning from positive and unlabeled data
    du Plessis, Marthinus C.
    Niu, Gang
    Sugiyama, Masashi
    MACHINE LEARNING, 2017, 106 (04) : 463 - 492
  • [24] Class-prior estimation for learning from positive and unlabeled data
    Marthinus C. du Plessis
    Gang Niu
    Masashi Sugiyama
    Machine Learning, 2017, 106 : 463 - 492
  • [25] Tri-training based learning from positive and unlabeled data
    Zhang, Bangzuo
    Zuo, Wanli
    2008 INTERNATIONAL SYMPOSIUM ON INFORMATION PROCESSING AND 2008 INTERNATIONAL PACIFIC WORKSHOP ON WEB MINING AND WEB-BASED APPLICATION, 2008, : 640 - 644
  • [26] BINARY CLASSIFICATION ONLY FROM UNLABELED DATA BY ITERATIVE UNLABELED-UNLABELED CLASSIFICATION
    Kaji, Hirotaka
    Sugiyama, Masashi
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 3527 - 3531
  • [27] Learning Classifiers on Positive and Unlabeled Data with Policy Gradient
    Li, Tianyu
    Wang, Chien-Chih
    Ma, Yukun
    Ortal, Patricia
    Zhao, Qifang
    Stenger, Bjorn
    Hirate, Yu
    2019 19TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2019), 2019, : 399 - 408
  • [28] A Bayesian Semisupervised Approach to Keyword Extraction with Only Positive and Unlabeled Data
    Wang, Guanshen
    Cheng, Yichen
    Xia, Yusen
    Ling, Qiang
    Wang, Xinlei
    INFORMS JOURNAL ON COMPUTING, 2023, 35 (03) : 675 - 691
  • [29] Learning from labeled and unlabeled data
    Kothari, R
    Jain, V
    PROCEEDING OF THE 2002 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-3, 2002, : 2803 - 2808
  • [30] A Boosting Algorithm for Training from Only Unlabeled Data
    Zhao, Yawen
    Yue, Lin
    Xu, Miao
    ADVANCED DATA MINING AND APPLICATIONS, ADMA 2022, PT II, 2022, 13726 : 459 - 473