Learning from data streams with only positive and unlabeled data

被引:13
|
作者
Qin, Xiangju [1 ]
Zhang, Yang [1 ,2 ]
Li, Chen [1 ]
Li, Xue [3 ]
机构
[1] Northwest A&F Univ, Coll Informat Engn, Yangling, Peoples R China
[2] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing 210008, Jiangsu, Peoples R China
[3] Univ Queensland, Sch Informat Technol & Elect Engn, Brisbane, Qld, Australia
基金
中国国家自然科学基金;
关键词
Positive and unlabeled learning; Data stream classification; Incremental learning; Functional leaves; DECISION TREES; CLASSIFICATION;
D O I
10.1007/s10844-012-0231-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many studies on streaming data classification have been based on a paradigm in which a fully labeled stream is available for learning purposes. However, it is often too labor-intensive and time-consuming to manually label a data stream for training. This difficulty may cause conventional supervised learning approaches to be infeasible in many real world applications, such as credit fraud detection, intrusion detection, and rare event prediction. In previous work, Li et al. suggested that these applications be treated as Positive and Unlabeled learning problem, and proposed a learning algorithm, OcVFD, as a solution (Li et al. 2009). Their method requires only a set of positive examples and a set of unlabeled examples which is easily obtainable in a streaming environment, making it widely applicable to real-life applications. Here, we enhance Li et al.'s solution by adding three features: an efficient method to estimate the percentage of positive examples in the training stream, the ability to handle numeric attributes, and the use of more appropriate classification methods at tree leaves. Experimental results on synthetic and real-life datasets show that our enhanced solution (called PUVFDT) has very good classification performance and a strong ability to learn from data streams with only positive and unlabeled examples. Furthermore, our enhanced solution reduces the learning time of OcVFDT by about an order of magnitude. Even with 80 % of the examples in the training data stream unlabeled, PUVFDT can still achieve a competitive classification performance compared with that of VFDTcNB (Gama et al. 2003), a supervised learning algorithm.
引用
收藏
页码:405 / 430
页数:26
相关论文
共 50 条
  • [41] A Quantum-Inspired Direct Learning Strategy for Positive and Unlabeled Data
    Zhang, Chenguang
    Du, Xuejiao
    Zhang, Yan
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2023, 16 (01)
  • [42] An Active Learning Based on Uncertainty and Density Method for Positive and Unlabeled Data
    Luo, Jun
    Zhou, Wenan
    Du, Yu
    ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2018, PT I, 2018, 11334 : 229 - 241
  • [43] Active Learning for Multivariate Time Series Classification with Positive Unlabeled Data
    He, Guoliang
    Duan, Yong
    Li, Yifei
    Qian, Tieyun
    He, Jinrong
    Jia, Xiangyang
    2015 IEEE 27TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2015), 2015, : 178 - 185
  • [44] An Overview on Learning from Data Streams
    João Gama
    Pedro Rodrigues
    Jesús Aguilar-Ruiz
    New Generation Computing, 2006, 25 (1) : 1 - 4
  • [45] Active learning from data streams
    Zhu, Xingquan
    Zhang, Peng
    Lin, Xiaodong
    Shi, Yong
    ICDM 2007: PROCEEDINGS OF THE SEVENTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, 2007, : 757 - +
  • [46] Learning from Noisy Pairwise Similarity and Unlabeled Data
    Wu, Songhua
    Liu, Tongliang
    Han, Bo
    Yu, Jun
    Niu, Gang
    Sugiyama, Masashi
    JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23
  • [47] Word sense disambiguation by learning from unlabeled data
    Park, SB
    Zhang, BT
    Kim, YT
    38TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, 2000, : 547 - 554
  • [48] On the complexity of learning a class ratio from unlabeled data
    Fish B.
    Reyzin L.
    Journal of Artificial Intelligence Research, 2020, 69 : 1333 - 1349
  • [49] On the Complexity of Learning a Class Ratio from Unlabeled Data
    Fish, Benjamin
    Reyzin, Lev
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2020, 69 : 1333 - 1349
  • [50] Learning from Noisy Pairwise Similarity and Unlabeled Data
    Wu, Songhua
    Liu, Tongliang
    Han, Bo
    Yu, Jun
    Niu, Gang
    Sugiyama, Masashi
    Journal of Machine Learning Research, 2022, 23