Similarity-driven sampling for data mining

被引:0
|
作者
Reinartz, T [1 ]
机构
[1] Daimler Benz AG, Res & Technol, D-89013 Ulm, Germany
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Industrial databases often contain millions of tuples but most data mining algorithms suffer from limited applicability to only small sets of examples. In this paper, we propose to utilize data reduction before data mining to overcome this deficit. We specifically present a novel similarity-driven sampling approach which applies two preparation steps, sorting and stratification, and reuses an improved variant of leader clustering. We experimentally evaluate similarity-driven sampling in comparison to statistical sampling techniques in different classification domains using C4.5 and instance-based learning as data mining algorithms. Experimental results show that similarity-driven sampling often outperforms statistical sampling techniques in terms of error rates using smaller samples.
引用
收藏
页码:423 / 431
页数:9
相关论文
共 50 条
  • [1] Similarity-driven software reuse
    Bildhauer, Daniel
    Horn, Tassilo
    Ebert, Juergen
    2009 ICSE WORKSHOP ON COMPARISON AND VERSIONING OF SOFTWARE MODELS, 2009, : 31 - 36
  • [2] Similarity-driven flexible ligand docking
    Fradera, X
    Knegtel, RMA
    Mestres, J
    PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2000, 40 (04) : 623 - 636
  • [3] LoFT: Similarity-Driven Multiobjective Focused Library Design
    Fischer, J. Robert
    Lessel, Uta
    Rarey, Matthias
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2010, 50 (01) : 1 - 21
  • [4] Similarity-driven adversarial testing of neural networks
    Filus, Katarzyna
    Domanska, Joanna
    KNOWLEDGE-BASED SYSTEMS, 2024, 305
  • [5] Similarity-driven flexible ligand docking.
    Mestres, J
    Fradera, X
    Knegtel, RMA
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2000, 219 : U608 - U609
  • [6] Similarity-driven defuzziflcation of fuzzy tuples for entropy-based data classification purposes
    Angryk, Rafal A.
    2006 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1-5, 2006, : 414 - 422
  • [7] Similarity-Driven Edge Bundling: Data-Oriented Clutter Reduction in Graphs Layouts
    Sikansi, Fabio
    da Silva, Renato R. O.
    Cantareira, Gabriel D.
    Etemad, Elham
    Paulovich, Fernando V.
    ALGORITHMS, 2020, 13 (11) : 1 - 27
  • [8] A method of similarity-driven knowledge revision for type specializations
    Morita, N
    Haraguchi, M
    Okubo, Y
    ALGORITHMIC LEARNING THEORY, PROCEEDINGS, 1999, 1720 : 194 - 205
  • [9] Similarity-driven topology finding of surface patterns for structural design
    Oval, R.
    Mesnil, R.
    Van Mele, T.
    Baverel, O.
    Block, P.
    COMPUTER-AIDED DESIGN, 2024, 176
  • [10] Similarity-Driven Semantic Role Induction via Graph Partitioning
    Lang, Joel
    Lapata, Mirella
    COMPUTATIONAL LINGUISTICS, 2014, 40 (03) : 633 - 669