Selective Sampling Designs to Improve the Performance of Classification Methods

被引:1
|
作者
Ghorbani, Soroosh [1 ]
Desmarais, Michel C. [1 ]
机构
[1] Comp & Software Engn Dept, Montreal, PQ, Canada
关键词
Planned Missing Data Design; Selective Sampling; Classification;
D O I
10.1109/ICMLA.2013.187
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Selective Sampling design refers to the situation where a study has a fixed number of observations but can decide to allocate them differently among the variables during the data gathering phase, such that some variables will have a greater ratio of missing values than others. In particular, we can decide to allocate more, or less missing values to uncertain variables: those for which the relative frequency is closer to 50% (higher uncertainty), or further from 50% (lower certainty). The main objective of the study is to investigate how a Selective Sampling process helps improve the performance of classification methods. This study specifically asks: "Can Selective Sampling affect the performance of the classification methods?" We focus on the three different classification models of NaIve Bayes, Logistic Regression and Tree Augmented Naive Bayes (TAN) for binary datasets. Three different schemes of sampling are defined: 1-Uniform (random samples) as a baseline, 2-Most Uncertain (higher sampling rate of uncertain items) and 3-Least Uncertain (lower sampling rate of uncertain items). We investigate the impacts of these different schemes on the performance of the three models on 11 different datasets. The results from 100 fold cross-validation show that Selective Sampling in all of the datasets improves the prediction performance of the TAN model and, in more than half of the datasets (54.6%), brings a higher prediction performance to NaIve Bayes and Logistic Regression classifiers.
引用
收藏
页码:178 / 181
页数:4
相关论文
共 50 条
  • [21] USING GENETIC ALGORITHMS TO IMPROVE THE PERFORMANCE OF CLASSIFICATION RULES PRODUCED BY SYMBOLIC INDUCTIVE METHODS
    BALA, J
    DEJONG, K
    PACHOWICZ, P
    LECTURE NOTES IN ARTIFICIAL INTELLIGENCE, 1991, 542 : 286 - 295
  • [22] How to improve sampling in medium-sized studies using designs with mixed methods? Contributions from the field of elite studies
    Serna, Miguel
    EMPIRIA, 2019, (43): : 187 - 210
  • [23] A Mixed Methods Investigation of Mixed Methods Sampling Designs in Social and Health Science Research
    Collins, Kathleen M. T.
    Onwuegbuzie, Anthony J.
    Jiao, Qun G.
    JOURNAL OF MIXED METHODS RESEARCH, 2007, 1 (03) : 267 - 294
  • [24] Enhanced Condenser Tube Designs Improve Plant Performance
    Webb, Ralph L.
    POWER, 2010, 154 (04) : 51 - 55
  • [25] Improve backplane performance with source-synchronous designs
    Sledjeski, L
    ELECTRONIC DESIGN, 2000, 48 (16) : 111 - +
  • [26] Complementary Sampling Methods to Improve the Monitoring of Coastal Lagoons
    Adao, Ana C.
    Bosch, Nestor E.
    Bentes, Luis
    Coelho, Rui
    Lino, Pedro G.
    Monteiro, Pedro
    Goncalves, Jorge M. S.
    Erzini, Karim
    DIVERSITY-BASEL, 2022, 14 (10):
  • [27] Demographic Factors Improve Classification Performance
    Hovy, Dirk
    PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1, 2015, : 752 - 762
  • [28] Monitoring Methods and Designs for Evaluating Bioretention Performance
    Cording, Amanda
    Hurley, Stephanie
    Whitney, David
    JOURNAL OF ENVIRONMENTAL ENGINEERING, 2017, 143 (12)
  • [29] Assessing the performance of sampling designs for measuring the abundance of understory plants
    Abrahamson, Ilana L.
    Nelson, Cara R.
    Affleck, David L. R.
    ECOLOGICAL APPLICATIONS, 2011, 21 (02) : 452 - 464
  • [30] Data Trimming Methods to Improve Gesture Classification
    Roh, Hye Sung
    Kim, DaeEun
    2021 24TH INTERNATIONAL CONFERENCE ON ELECTRICAL MACHINES AND SYSTEMS (ICEMS 2021), 2021, : 2449 - 2452