Similarity-driven sampling for data mining

被引:0
|
作者
Reinartz, T [1 ]
机构
[1] Daimler Benz AG, Res & Technol, D-89013 Ulm, Germany
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Industrial databases often contain millions of tuples but most data mining algorithms suffer from limited applicability to only small sets of examples. In this paper, we propose to utilize data reduction before data mining to overcome this deficit. We specifically present a novel similarity-driven sampling approach which applies two preparation steps, sorting and stratification, and reuses an improved variant of leader clustering. We experimentally evaluate similarity-driven sampling in comparison to statistical sampling techniques in different classification domains using C4.5 and instance-based learning as data mining algorithms. Experimental results show that similarity-driven sampling often outperforms statistical sampling techniques in terms of error rates using smaller samples.
引用
收藏
页码:423 / 431
页数:9
相关论文
共 50 条
  • [41] Mining on the Basis of Similarity in Graph and Image Data
    Srivastava, Vishal
    Biswas, Bhaskar
    ADVANCED INFORMATICS FOR COMPUTING RESEARCH, PT II, 2019, 956 : 193 - 203
  • [42] Similarity problems in time series data mining
    Yan, XB
    Li, YJ
    Fan, B
    PROCEEDINGS OF 2003 INTERNATIONAL CONFERENCE ON MANAGEMENT SCIENCE & ENGINEERING, VOLS I AND II, 2003, : 382 - 385
  • [43] The Calculation of Similarity and Its Application in Data Mining
    Teng, Shaohua
    Li, Junlei
    Li, Rigui
    Zhang, Wei
    PERVASIVE COMPUTING AND THE NETWORKED WORLD, 2014, 8351 : 563 - 574
  • [44] Data driven Dirichlet sampling on manifolds
    Prado, Luan S.
    Ritto, Thiago G.
    Journal of Computational Physics, 2021, 444
  • [45] Similarity-driven multi-view embeddings from high-dimensional biomedical data (Feb, 10.1038/s43588-021-00029-8, 2021)
    Avants, Brian B.
    Tustison, Nicholas J.
    Stone, James R.
    NATURE COMPUTATIONAL SCIENCE, 2021, 1 (03): : 239 - 239
  • [46] Data driven Dirichlet sampling on manifolds
    Prado, Luan S.
    Ritto, Thiago G.
    JOURNAL OF COMPUTATIONAL PHYSICS, 2021, 444
  • [47] Data Driven Sampling of Oscillating Signals
    Brigitte Bidegaray-Fesquet
    Marianne Clausel
    Sampling Theory in Signal and Image Processing, 2014, 13 (2): : 175 - 187
  • [48] DATA MINING DRIVEN DECISION MAKING
    Sokolova, Marina V.
    Fernandez-Caballero, Antonio
    ICAART 2009: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE, 2009, : 220 - +
  • [49] AN ONTOLOGY DRIVEN DATA MINING PROCESS
    Brisson, Laurent
    Collard, Martine
    ICEIS 2008: PROCEEDINGS OF THE TENTH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS, VOL AIDSS: ARTIFICIAL INTELLIGENCE AND DECISION SUPPORT SYSTEMS, 2008, : 54 - +
  • [50] Role of sampling in data mining for association rules
    Jeragh, M
    Mehrotra, KG
    IC-AI'2001: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOLS I-III, 2001, : 483 - 489