Fast-RCM: Fast Tree-Based Unsupervised Rare-Class Mining

被引:0
|
作者
Weng, Haiqin [1 ]
Ji, Shouling [1 ,2 ,3 ]
Liu, Changchang [4 ]
Wang, Ting [5 ]
He, Qinming [1 ]
Chen, Jianhai [1 ]
机构
[1] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou 310027, Peoples R China
[2] Zhejiang Univ, Inst Cyberspace Res, Hangzhou 310027, Peoples R China
[3] Zhejiang Univ, Alibaba Zhejiang Univ Joint Inst Frontier Technol, Hangzhou 310027, Peoples R China
[4] IBM Thomas J Watson Res Ctr, Dept Distributed AI, Yorktown Hts, NY 10598 USA
[5] Lehigh Univ, Dept Comp Sci, Bethlehem, PA 18015 USA
关键词
Anomaly detection; Diseases; Vegetation; Approximation algorithms; Time complexity; Computer science; Clustering methods; data mining; tree data structures; CATEGORY DETECTION;
D O I
10.1109/TCYB.2019.2924804
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Rare classes are usually hidden in an imbalanced dataset with the majority of the data examples from major classes. Rare-class mining (RCM) aims at extracting all the data examples belonging to rare classes. Most of the existing approaches for RCM require a certain amount of labeled data examples as input. However, they are ineffective in practice since requesting label information from domain experts is time consuming and human-labor extensive. Thus, we investigate the unsupervised RCM problem, which to the best of our knowledge is the first such attempt. To this end, we propose an efficient algorithm called Fast-RCM for unsupervised RCM, which has an approximately linear time complexity with respect to data size and data dimensionality. Given an unlabeled dataset, Fast-RCM mines out the rare class by first building a rare tree for the input dataset and then extracting data examples of the rare classes based on this rare tree. Compared with the existing approaches which have quadric or even cubic time complexity, Fast-RCM is much faster and can be extended to large-scale datasets. The experimental evaluation on both synthetic and real-world datasets demonstrate that our algorithm can effectively and efficiently extract the rare classes from an unlabeled dataset under the unsupervised settings, and is approximately five times faster than that of the state-of-the-art methods.
引用
收藏
页码:5198 / 5211
页数:14
相关论文
共 50 条
  • [1] A tree-based method for fast melodic retrieval
    Parker, C
    JCDL 2004: PROCEEDINGS OF THE FOURTH ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES: GLOBAL REACH AND DIVERSE IMPACT, 2004, : 254 - 255
  • [2] Tree-Based Graph Indexing for Fast kNN Queries
    Kobayashi, Suomi
    Matsugu, Shohei
    Shiokawa, Hiroaki
    INFORMATION INTEGRATION AND WEB INTELLIGENCE, IIWAS 2022, 2022, 13635 : 195 - 207
  • [3] Fast Tree-Based Classification via Homogeneous Clustering
    Pardis, George
    Diamantaras, Konstantinos I.
    Ougiaroglou, Stefanos
    Evangelidis, Georgios
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2019, PT I, 2019, 11871 : 514 - 524
  • [4] Fast SDN updates using tree-based architecture
    Hatami R.
    Bahramgiri H.
    International Journal of Communication Networks and Distributed Systems, 2020, 25 (03): : 333 - 346
  • [5] Fast tree-based redistancing for level set computations
    Strain, J
    JOURNAL OF COMPUTATIONAL PHYSICS, 1999, 152 (02) : 664 - 686
  • [6] Fast SDN updates using tree-based architecture
    Hatami, Rashid
    Bahramgiri, Hossein
    INTERNATIONAL JOURNAL OF COMMUNICATION NETWORKS AND DISTRIBUTED SYSTEMS, 2020, 25 (03) : 333 - 346
  • [7] A novel approach for mining emerging patterns in rare-class datasets
    Alhammady, Hamad
    INNOVATIONS AND ADVANCED TECHNIQUES IN COMPUTER AND INFORMATION SCIENCES AND ENGINEERING, 2007, : 207 - 211
  • [8] Impact of the Initialization in Tree-Based Fast Similarity Search Techniques
    Serrano, Aureo
    Mico, Luisa
    Oncina, Jose
    SIMILARITY-BASED PATTERN RECOGNITION, 2011, 7005 : 163 - 176
  • [9] Tree-based backoff protocol for fast RFID tag identification
    Zheng, Jia-Li
    Qin, Tuan-Fa
    Ni, Guang-Nan
    Zheng, J.-L. (lemontree312@live.cn), 2013, Beijing University of Posts and Telecommunications (20): : 37 - 41
  • [10] A weighted decision tree-based fast intrusion detection model
    Tian, Jun-feng
    Guo, Huai-yu
    Ma, Guo-fu
    IC-BNMT 2007: PROCEEDINGS OF 2007 INTERNATIONAL CONFERENCE ON BROADBAND NETWORK & MULTIMEDIA TECHNOLOGY, 2007, : 115 - 120