Optimal ratio for data splitting

被引:300
|
作者
Joseph, V. Roshan [1 ]
机构
[1] Georgia Inst Technol, H Milton Stewart Sch Ind & Syst Engn, Atlanta, GA 30332 USA
基金
美国国家科学基金会;
关键词
testing; training; validation; CALIBRATION; VALIDATION; MODELS;
D O I
10.1002/sam.11583
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
It is common to split a dataset into training and testing sets before fitting a statistical or machine learning model. However, there is no clear guidance on how much data should be used for training and testing. In this article, we show that the optimal training/testing splitting ratio is root p : 1, where p is the number of parameters in a linear regression model that explains the data well.
引用
收藏
页码:531 / 538
页数:8
相关论文
共 50 条
  • [1] SPlit: An Optimal Method for Data Splitting
    Joseph, V. Roshan
    Vakayil, Akhil
    TECHNOMETRICS, 2022, 64 (02) : 166 - 176
  • [2] Optimal Data Splitting in Distributed Optimization for Machine Learning
    Medyakov, D.
    Molodtsov, G.
    Beznosikov, A.
    Gasnikov, A.
    DOKLADY MATHEMATICS, 2023, 108 (SUPPL 2) : S465 - S475
  • [3] SPLITTING RATIO
    DORRINGTON, KL
    ANAESTHESIA, 1985, 40 (07) : 704 - 705
  • [4] Optimal Frame-Splitting Ratio for Uncoded and Coded Transmissions with Distributed Antennas
    Yasutake, Makoto
    Cheng, Jun
    Sun, Chen
    Watanabe, Yoichiro
    2009 IEEE PACIFIC RIM CONFERENCE ON COMMUNICATIONS, COMPUTERS AND SIGNAL PROCESSING, VOLS 1 AND 2, 2009, : 129 - 134
  • [5] Optimal Power-Splitting Ratio for Wireless Energy Harvesting in Relay Networks
    Atapattu, Saman
    Evans, Jamie
    2015 IEEE 82ND VEHICULAR TECHNOLOGY CONFERENCE (VTC FALL), 2015,
  • [6] Optimal splitting technique for remote sensing satellite imagery data
    Chaudhuri, D
    Mishra, A
    Anand, SK
    Gohri, V
    VISUAL INFORMATION PROCESSING X, 2001, 4388 : 79 - 88
  • [7] Parallel Data Broadcasting for Optimal Client Service Ratio
    Liaskos, Christos
    Papadimitriou, Georgios
    Nicopolitidis, Petros
    Pomportsis, Andreas
    IEEE COMMUNICATIONS LETTERS, 2012, 16 (11) : 1741 - 1743
  • [8] SPLITTING RATIO - REPLY
    LEIGH, JM
    ANAESTHESIA, 1985, 40 (07) : 705 - 705
  • [9] An optimal data-splitting algorithm for aircraft sequencing on two runways
    Prakash, Rakesh
    Piplani, Rajesh
    Desai, Jitamitra
    Transportation Research Part C: Emerging Technologies, 2021, 132
  • [10] An optimal data-splitting algorithm for aircraft sequencing on a single runway
    Rakesh Prakash
    Jitamitra Desai
    Rajesh Piplani
    Annals of Operations Research, 2022, 309 : 587 - 610