Improving Model Selection by Employing the Test Data

被引:0
|
作者
Westphal, Max [1 ]
Brannath, Werner [1 ]
机构
[1] Univ Bremen, Fac Math & Comp Sci 3, Inst Stat, Bremen, Germany
关键词
MULTIPLE COMPARISONS; OVER-OPTIMISM; BIOINFORMATICS; INTELLIGENCE; INFERENCE; DESIGN;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Model selection and evaluation are usually strictly separated by means of data splitting to enable an unbiased estimation and a simple statistical inference for the unknown generalization performance of the final prediction model. We investigate the properties of novel evaluation strategies, namely when the final model is selected based on empirical performances on the test data. To guard against selection induced overoptimism, we employ a parametric multiple test correction based on the approximate multivariate distribution of performance estimates. Our numerical experiments involve training common machine learning algorithms (EN, CART, SVM, XGB) on various artificial classification tasks. At its core, our proposed approach improves model selection in terms of the expected final model performance without introducing overoptimism. We furthermore observed a higher probability for a successful evaluation study, making it easier in practice to empirically demonstrate a sufficiently high predictive performance.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] A Test for Selection Employing Quantitative Trait Locus and Mutation Accumulation Data
    Rice, Daniel P.
    Townsend, Jeffrey P.
    GENETICS, 2012, 190 (04) : 1533 - +
  • [2] Improving Data Quality for Regression Test Selection by Reducing Annotation Noise
    Al-Sabbagh, Khaled Walid
    Staron, Miroslaw
    Hebig, Regina
    Meding, Wilhelm
    2020 46TH EUROMICRO CONFERENCE ON SOFTWARE ENGINEERING AND ADVANCED APPLICATIONS (SEAA 2020), 2020, : 191 - 194
  • [3] Uncertainty with the Gamma Test for Model Input Data Selection
    Han, Dawei
    Yan, Weizhong
    Nia, Alireza Moghaddam
    2010 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS IJCNN 2010, 2010,
  • [4] A model selection test for bivariate failure-time data
    Chen, Xiaohong
    Fan, Yanqin
    ECONOMETRIC THEORY, 2007, 23 (03) : 414 - 439
  • [5] An efficient test pattern selection method for improving defect coverage with reduced test data volume and test application time
    Wang, Zhanglei
    Chakrabarty, Krishnendu
    PROCEEDINGS OF THE 15TH ASIAN TEST SYMPOSIUM, 2006, : 333 - +
  • [6] Model-driven Data Layout Selection for Improving Read Performance
    Liu, Jialin
    Byna, Surendra
    Dong, Bin
    Wu, Kesheng
    Chen, Yong
    PROCEEDINGS OF 2014 IEEE INTERNATIONAL PARALLEL & DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2014, : 1708 - 1716
  • [7] DynImpt: A Dynamic Data Selection Method for Improving Model Training Efficiency
    Huang, Wei
    Zhang, Yunxiao
    Guo, Shangmin
    Shang, Yu-Ming
    Fu, Xiangling
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2025, 37 (01) : 239 - 252
  • [8] Model order selection for short data: An exponential fitting test (EFT)
    Quinlan, Angela
    Barbot, Jean-Pierre
    Larzabal, Pascal
    Haardt, Andmartin
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2007, 2007 (1)
  • [9] Model Order Selection for Short Data: An Exponential Fitting Test (EFT)
    Angela Quinlan
    Jean-Pierre Barbot
    Pascal Larzabal
    Martin Haardt
    EURASIP Journal on Advances in Signal Processing, 2007
  • [10] A METHOD FOR TEST DATA SELECTION
    VELASCO, FRD
    JOURNAL OF SYSTEMS AND SOFTWARE, 1987, 7 (02) : 89 - 97