Selective Sampling Designs to Improve the Performance of Classification Methods

被引:1
|
作者
Ghorbani, Soroosh [1 ]
Desmarais, Michel C. [1 ]
机构
[1] Comp & Software Engn Dept, Montreal, PQ, Canada
关键词
Planned Missing Data Design; Selective Sampling; Classification;
D O I
10.1109/ICMLA.2013.187
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Selective Sampling design refers to the situation where a study has a fixed number of observations but can decide to allocate them differently among the variables during the data gathering phase, such that some variables will have a greater ratio of missing values than others. In particular, we can decide to allocate more, or less missing values to uncertain variables: those for which the relative frequency is closer to 50% (higher uncertainty), or further from 50% (lower certainty). The main objective of the study is to investigate how a Selective Sampling process helps improve the performance of classification methods. This study specifically asks: "Can Selective Sampling affect the performance of the classification methods?" We focus on the three different classification models of NaIve Bayes, Logistic Regression and Tree Augmented Naive Bayes (TAN) for binary datasets. Three different schemes of sampling are defined: 1-Uniform (random samples) as a baseline, 2-Most Uncertain (higher sampling rate of uncertain items) and 3-Least Uncertain (lower sampling rate of uncertain items). We investigate the impacts of these different schemes on the performance of the three models on 11 different datasets. The results from 100 fold cross-validation show that Selective Sampling in all of the datasets improves the prediction performance of the TAN model and, in more than half of the datasets (54.6%), brings a higher prediction performance to NaIve Bayes and Logistic Regression classifiers.
引用
收藏
页码:178 / 181
页数:4
相关论文
共 50 条
  • [1] Selective sampling methods in one-class classification problems
    Juszczak, P
    Duin, RPW
    ARTIFICAIL NEURAL NETWORKS AND NEURAL INFORMATION PROCESSING - ICAN/ICONIP 2003, 2003, 2714 : 140 - 148
  • [2] Selective sampling for classification
    Laviolette, Francois
    Marchand, Mario
    Shanian, Sara
    ADVANCES IN ARTIFICIAL INTELLIGENCE, 2008, 5032 : 191 - 202
  • [3] Selective Sampling on Graphs for Classification
    Gu, Quanquan
    Aggarwal, Charu
    Liu, Jialu
    Han, Jiawei
    19TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'13), 2013, : 131 - 139
  • [4] The effect of quantization on the performance of sampling designs
    Benhenni, K
    Cambanis, S
    IEEE TRANSACTIONS ON INFORMATION THEORY, 1998, 44 (05) : 1981 - 1992
  • [5] Sampling Methods and Survey Designs for Larval Lampreys
    Clemens, Benjamin J.
    Harris, Julianne E.
    Starcevich, Steven J.
    Evans, Thomas M.
    Skalicky, Joseph J.
    Neave, Fraser
    Lampman, Ralph T.
    NORTH AMERICAN JOURNAL OF FISHERIES MANAGEMENT, 2022, 42 (02) : 455 - 474
  • [6] Investigation of PNN Optimization Methods to Improve Classification Performance in Transplantation Medicine
    Havryliuk, Myroslav
    Hovdysh, Nazarii
    Tolstyak, Yaroslav
    Chopyak, Valentyna
    Kustra, Natalya
    6TH INTERNATIONAL CONFERENCE ON INFORMATICS & DATA-DRIVEN MEDICINE, IDDM 2023, 2023, 3609
  • [7] Using sub-sampling and ensemble clustering techniques to improve performance of imbalanced classification
    Nejatian, Samad
    Parvin, Hamid
    Faraji, Eshagh
    NEUROCOMPUTING, 2018, 276 : 55 - 66
  • [8] Optimization of Klystron Designs Using Deterministic Sampling Methods
    Hien Tran
    Lankford, George
    Read, Michael E.
    Ives, R. Lawrence
    Reppert, Kelsey
    Cline, Kayla
    Guzman, Juan
    IEEE TRANSACTIONS ON ELECTRON DEVICES, 2015, 62 (03) : 1032 - 1036
  • [9] Estimation in Complex Sampling Designs Based on Resampling Methods
    Bardia Panahbehagh
    Journal of Agricultural, Biological and Environmental Statistics, 2020, 25 : 206 - 228
  • [10] Estimation in Complex Sampling Designs Based on Resampling Methods
    Panahbehagh, Bardia
    JOURNAL OF AGRICULTURAL BIOLOGICAL AND ENVIRONMENTAL STATISTICS, 2020, 25 (02) : 206 - 228