Learning from data streams with only positive and unlabeled data

被引:13
|
作者
Qin, Xiangju [1 ]
Zhang, Yang [1 ,2 ]
Li, Chen [1 ]
Li, Xue [3 ]
机构
[1] Northwest A&F Univ, Coll Informat Engn, Yangling, Peoples R China
[2] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing 210008, Jiangsu, Peoples R China
[3] Univ Queensland, Sch Informat Technol & Elect Engn, Brisbane, Qld, Australia
基金
中国国家自然科学基金;
关键词
Positive and unlabeled learning; Data stream classification; Incremental learning; Functional leaves; DECISION TREES; CLASSIFICATION;
D O I
10.1007/s10844-012-0231-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many studies on streaming data classification have been based on a paradigm in which a fully labeled stream is available for learning purposes. However, it is often too labor-intensive and time-consuming to manually label a data stream for training. This difficulty may cause conventional supervised learning approaches to be infeasible in many real world applications, such as credit fraud detection, intrusion detection, and rare event prediction. In previous work, Li et al. suggested that these applications be treated as Positive and Unlabeled learning problem, and proposed a learning algorithm, OcVFD, as a solution (Li et al. 2009). Their method requires only a set of positive examples and a set of unlabeled examples which is easily obtainable in a streaming environment, making it widely applicable to real-life applications. Here, we enhance Li et al.'s solution by adding three features: an efficient method to estimate the percentage of positive examples in the training stream, the ability to handle numeric attributes, and the use of more appropriate classification methods at tree leaves. Experimental results on synthetic and real-life datasets show that our enhanced solution (called PUVFDT) has very good classification performance and a strong ability to learn from data streams with only positive and unlabeled examples. Furthermore, our enhanced solution reduces the learning time of OcVFDT by about an order of magnitude. Even with 80 % of the examples in the training data stream unlabeled, PUVFDT can still achieve a competitive classification performance compared with that of VFDTcNB (Gama et al. 2003), a supervised learning algorithm.
引用
收藏
页码:405 / 430
页数:26
相关论文
共 50 条
  • [31] Learning from Positive and Unlabeled Data without Explicit Estimation of Class Prior
    Zhang, Chenguang
    Hou, Yuexian
    Zhang, Yan
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 6762 - 6769
  • [32] Efficient learning of unlabeled term trees with contractible variables from positive data
    Suzuki, Y
    Shoudai, T
    Matsumoto, S
    Uchida, T
    INDUCTIVE LOGIC PROGRAMMING, PROCEEDINGS, 2003, 2835 : 347 - 364
  • [33] Beyond the Selected Completely at Random Assumption for Learning from Positive and Unlabeled Data
    Bekker, Jessa
    Robberechts, Pieter
    Davis, Jesse
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2019, PT II, 2020, 11907 : 71 - 85
  • [34] Learning Compression from Limited Unlabeled Data
    He, Xiangyu
    Cheng, Jian
    COMPUTER VISION - ECCV 2018, PT I, 2018, 11205 : 778 - 795
  • [35] Classification from Positive, Unlabeled and Biased Negative Data
    Hsieh, Yu-Guan
    Niu, Gang
    Sugiyama, Masashi
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [36] Class Prior Estimation from Positive and Unlabeled Data
    Du Plessis, Marthinus Christoffel
    Sugiyama, Masashi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2014, E97D (05): : 1358 - 1362
  • [37] Fast Factorization-Free Kernel Learning for Unlabeled Chunk Data Streams
    Wang, Yi
    Xue, Nan
    Fan, Xin
    Luo, Jiebo
    Liu, Risheng
    Chen, Bin
    Li, Haojie
    Luo, Zhongxuan
    PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 2833 - 2839
  • [38] Two birds with one stone: Classifying positive and unlabeled examples on uncertain data streams
    Han, Donghong
    Li, Shuoru
    Wei, Fulin
    Tang, Yuying
    Zhu, Feida
    Wang, Guoren
    NEUROCOMPUTING, 2018, 277 : 149 - 160
  • [39] Learning from not-all-negative pairwise data and unlabeled data
    Huang, Shuying
    Li, Junpeng
    Hua, Changchun
    Yang, Yana
    PATTERN RECOGNITION, 2025, 163
  • [40] A Quantum-Inspired Direct Learning Strategy for Positive and Unlabeled Data
    Chenguang Zhang
    Xuejiao Du
    Yan Zhang
    International Journal of Computational Intelligence Systems, 16