Optimal ratio for data splitting

被引:300
|
作者
Joseph, V. Roshan [1 ]
机构
[1] Georgia Inst Technol, H Milton Stewart Sch Ind & Syst Engn, Atlanta, GA 30332 USA
基金
美国国家科学基金会;
关键词
testing; training; validation; CALIBRATION; VALIDATION; MODELS;
D O I
10.1002/sam.11583
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
It is common to split a dataset into training and testing sets before fitting a statistical or machine learning model. However, there is no clear guidance on how much data should be used for training and testing. In this article, we show that the optimal training/testing splitting ratio is root p : 1, where p is the number of parameters in a linear regression model that explains the data well.
引用
收藏
页码:531 / 538
页数:8
相关论文
共 50 条
  • [21] DATA SPLITTING
    PICARD, RR
    BERK, KN
    AMERICAN STATISTICIAN, 1990, 44 (02): : 140 - 147
  • [22] Optimal linkage disequilibrium splitting
    Prive, Florian
    BIOINFORMATICS, 2022, 38 (01) : 255 - 256
  • [23] Splitting of aggregated medical and demographic data Splitting aggregated data
    Mikhalskii, A., I
    Gorlischev, V. P.
    Jdanov, D. A.
    Grigoriev, P.
    2017 11TH IEEE INTERNATIONAL CONFERENCE ON APPLICATION OF INFORMATION AND COMMUNICATION TECHNOLOGIES (AICT 2017), 2017,
  • [24] Optimal Transport with Proximal Splitting
    Papadakis, Nicolas
    Peyre, Gabriel
    Oudet, Edouard
    SIAM JOURNAL ON IMAGING SCIENCES, 2014, 7 (01): : 212 - 238
  • [25] A Splitting Method for Optimal Control
    O'Donoghue, Brendan
    Stathopoulos, Giorgos
    Boyd, Stephen
    IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, 2013, 21 (06) : 2432 - 2442
  • [26] Optimal Field Splitting in IMRT
    Dou, X.
    Wu, X.
    Kim, Y.
    Bayouth, J.
    Buatti, J.
    MEDICAL PHYSICS, 2008, 35 (06) : 2749 - +
  • [27] An optimal data-splitting algorithm for aircraft scheduling on a single runway to maximize throughput
    Prakash, Rakesh
    Piplani, Rajesh
    Desai, Jitamitra
    TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2018, 95 : 570 - 581
  • [28] Optimal Spot-Checking Ratio for Probabilistic Attacks in Remote Data Checking
    Park, Younsoo
    Choi, Jungwoo
    Kwon, Young-Bin
    Park, Jaehwa
    Park, Ho-Hyun
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2017, E100D (08): : 1911 - 1915
  • [29] Toward optimal spatial filters for demultiple and wavefield splitting of ocean-bottom seismic data
    Osen, A
    Amundsen, L
    Reitan, A
    GEOPHYSICS, 2002, 67 (06) : 1983 - 1990
  • [30] The Splitting Game: Value and Optimal Strategies
    Miquel Oliu-Barton
    Dynamic Games and Applications, 2018, 8 : 157 - 179