Selective Sampling Designs to Improve the Performance of Classification Methods

被引:1
|
作者
Ghorbani, Soroosh [1 ]
Desmarais, Michel C. [1 ]
机构
[1] Comp & Software Engn Dept, Montreal, PQ, Canada
关键词
Planned Missing Data Design; Selective Sampling; Classification;
D O I
10.1109/ICMLA.2013.187
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Selective Sampling design refers to the situation where a study has a fixed number of observations but can decide to allocate them differently among the variables during the data gathering phase, such that some variables will have a greater ratio of missing values than others. In particular, we can decide to allocate more, or less missing values to uncertain variables: those for which the relative frequency is closer to 50% (higher uncertainty), or further from 50% (lower certainty). The main objective of the study is to investigate how a Selective Sampling process helps improve the performance of classification methods. This study specifically asks: "Can Selective Sampling affect the performance of the classification methods?" We focus on the three different classification models of NaIve Bayes, Logistic Regression and Tree Augmented Naive Bayes (TAN) for binary datasets. Three different schemes of sampling are defined: 1-Uniform (random samples) as a baseline, 2-Most Uncertain (higher sampling rate of uncertain items) and 3-Least Uncertain (lower sampling rate of uncertain items). We investigate the impacts of these different schemes on the performance of the three models on 11 different datasets. The results from 100 fold cross-validation show that Selective Sampling in all of the datasets improves the prediction performance of the TAN model and, in more than half of the datasets (54.6%), brings a higher prediction performance to NaIve Bayes and Logistic Regression classifiers.
引用
收藏
页码:178 / 181
页数:4
相关论文
共 50 条
  • [41] A Fast Approach to Improve Classification Performance of ECOC Classification Systems
    Simeone, Paolo
    Tax, David M. J.
    Duin, Robert P. W.
    Tortorella, Francesco
    STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, 2008, 5342 : 459 - +
  • [42] Gibbs sampling classification of QAM signals in frequency selective channels
    Drumright, TA
    Ding, Z
    THIRTY-SIXTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS - CONFERENCE RECORD, VOLS 1 AND 2, CONFERENCE RECORD, 2002, : 833 - 837
  • [43] Worst-case analysis of selective sampling for linear classification
    Cesa-Bianchi, Nicolo
    Gentile, Claudio
    Zaniboni, Luca
    JOURNAL OF MACHINE LEARNING RESEARCH, 2006, 7 : 1205 - 1230
  • [44] Parallel selective sampling method for imbalanced and large data classification
    D'Addabbo, Annarita
    Maglietta, Rosalia
    PATTERN RECOGNITION LETTERS, 2015, 62 : 61 - 67
  • [45] An Aggressive Graph-based Selective Sampling Algorithm for Classification
    Yang, Peng
    Zhao, Peilin
    Zheng, Vincent W.
    Li, Xiao-Li
    2015 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2015, : 509 - 518
  • [46] Shark skin-inspired designs that improve aerodynamic performance
    Domel, August G.
    Saadat, Mehdi
    Weaver, James C.
    Haj-Hariri, Hossein
    Bertoldi, Katia
    Lauder, George V.
    JOURNAL OF THE ROYAL SOCIETY INTERFACE, 2018, 15 (139)
  • [47] METHODS TO IMPROVE THE PERFORMANCE OF STRIPPING EMULSIONS
    HERZ, RH
    LABORATORY INVESTIGATION, 1959, 8 (01) : 71 - 81
  • [48] Tuning diffractive multifocal lens designs to improve visual performance
    Marcos, Susana
    Aissati, Sara
    Zou, Tianlun
    Goswami, Sabyasachi
    Gandara-Montano, Gustavo
    Zheleznyak, Len
    INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 2024, 65 (07)
  • [49] Case-specific designs improve drill bit performance
    Cory, S
    Turner, E
    OIL & GAS JOURNAL, 2003, 101 (11) : 55 - 59
  • [50] Utilising Sampling Methods to Improve the Prediction on Customers' Buying Intention
    Yap, Chau-Tean
    Khor, Kok-Chin
    2022 INTERNATIONAL CONFERENCE ON DECISION AID SCIENCES AND APPLICATIONS (DASA), 2022, : 352 - 356