Similarity-driven sampling for data mining

被引:0
|
作者
Reinartz, T [1 ]
机构
[1] Daimler Benz AG, Res & Technol, D-89013 Ulm, Germany
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Industrial databases often contain millions of tuples but most data mining algorithms suffer from limited applicability to only small sets of examples. In this paper, we propose to utilize data reduction before data mining to overcome this deficit. We specifically present a novel similarity-driven sampling approach which applies two preparation steps, sorting and stratification, and reuses an improved variant of leader clustering. We experimentally evaluate similarity-driven sampling in comparison to statistical sampling techniques in different classification domains using C4.5 and instance-based learning as data mining algorithms. Experimental results show that similarity-driven sampling often outperforms statistical sampling techniques in terms of error rates using smaller samples.
引用
收藏
页码:423 / 431
页数:9
相关论文
共 50 条
  • [21] Improving Similarity-Driven Library Design: Customized Matching and Regioselective Feature Trees
    Fischer, J. Robert
    Lessel, Uta
    Rarey, Matthias
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2011, 51 (09) : 2156 - 2163
  • [22] Publisher Correction: Similarity-driven multi-view embeddings from high-dimensional biomedical data
    Brian B. Avants
    Nicholas J. Tustison
    James R. Stone
    Nature Computational Science, 2021, 1 : 239 - 239
  • [23] Similarity-Driven Topology Optimization for Statics and Crash via Energy Scaling Method
    Yousaf, Muhammad Salman
    Detwiler, Duane
    Duddeck, Fabian
    Menzel, Stefan
    Ramnath, Satchit
    Zurbrugg, Nathan
    Bujny, Mariusz
    JOURNAL OF MECHANICAL DESIGN, 2023, 145 (10)
  • [24] Similarity-driven pictorial database design using entity-relationship approach
    Lee, ET
    Lee, ME
    KYBERNETES, 1999, 28 (2-3) : 298 - 305
  • [25] SGCN: Structure and Similarity-Driven Graph Convolutional Network for Semi-Supervised Classification
    Guo, Wenqiang
    Hu, Yonglong
    Hou, Yongyan
    Xue, Bofeng
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (12) : 973 - 982
  • [26] Penalized-Likelihood PET Image Reconstruction Using Similarity-Driven Median Regularization
    Ren, Xue
    Jung, Ji Eun
    Zhu, Wen
    Lee, Soo-Jin
    TOMOGRAPHY, 2022, 8 (01) : 158 - 174
  • [27] An unsupervised anomalous sound detection method based on similarity-driven automatic feature selection
    Zhang, Yi
    Feng, Jie
    Zhang, Qiaoling
    Hu, Junyao
    Zhang, Weiwei
    DIGITAL SIGNAL PROCESSING, 2025, 161
  • [28] Similarity-Driven Fine-Tuning Methods for Regularization Parameter Optimization in PET Image Reconstruction
    Zhu, Wen
    Lee, Soo-Jin
    SENSORS, 2023, 23 (13)
  • [29] Similarity-driven truncated aggregation framework for privacy-preserving short term load forecasting
    Khan, Ahsan Raza
    Al-Quraan, Mohammad
    Mohjazi, Lina
    Flynn, David
    Imran, Muhammad Ali
    Zoha, Ahmed
    INTERNET OF THINGS, 2025, 31
  • [30] A triplet graph convolutional network with attention and similarity-driven dictionary learning for remote sensing image retrieval
    Regan, Jacob
    Khodayar, Mahdi
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 232