Optimal ratio for data splitting

被引:300
|
作者
Joseph, V. Roshan [1 ]
机构
[1] Georgia Inst Technol, H Milton Stewart Sch Ind & Syst Engn, Atlanta, GA 30332 USA
基金
美国国家科学基金会;
关键词
testing; training; validation; CALIBRATION; VALIDATION; MODELS;
D O I
10.1002/sam.11583
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
It is common to split a dataset into training and testing sets before fitting a statistical or machine learning model. However, there is no clear guidance on how much data should be used for training and testing. In this article, we show that the optimal training/testing splitting ratio is root p : 1, where p is the number of parameters in a linear regression model that explains the data well.
引用
收藏
页码:531 / 538
页数:8
相关论文
共 50 条
  • [41] Optimal Value of Current Ratio
    Honkova, Irena
    Kubenka, Michal
    HRADEC ECONOMIC DAYS 2020, VOL 10, PT 1, 2020, 10 : 244 - 249
  • [42] PROBLEMS OF OPTIMAL DEBT RATIO
    GUTENBERG, E
    ZEITSCHRIFT FUR BETRIEBSWIRTSCHAFT, 1966, 36 (11): : 681 - 703
  • [43] Optimal damping ratio of TLCDs
    Chen, YH
    Chao, CC
    STRUCTURAL ENGINEERING AND MECHANICS, 2000, 9 (03) : 227 - 240
  • [44] ON THE OPTIMAL SEX-RATIO
    KARLIN, S
    LESSARD, S
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA-BIOLOGICAL SCIENCES, 1983, 80 (19): : 5931 - 5935
  • [45] Data Fission: Splitting a Single Data Point
    Leiner, James
    Duan, Boyan
    Wasserman, Larry
    Ramdas, Aaditya
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2023,
  • [46] Optimal thermoelectric induced by Cooper pair splitting
    An, Jong Hak
    Jong, Kum Hyok
    PHYSICA B-CONDENSED MATTER, 2023, 654
  • [47] Optimal splitting for rare-event simulation
    Shortle, John F.
    Chen, Chun-Hung
    Crain, Ben
    Brodsky, Alexander
    Brod, Daniel
    IIE TRANSACTIONS, 2012, 44 (05) : 352 - 367
  • [48] Optimal Design of Splitting Receiver With Multiple Antennas
    Wang, Yanyan
    Liu, Wanchun
    Zhou, Xiangyun
    IEEE TRANSACTIONS ON COMMUNICATIONS, 2024, 72 (03) : 1318 - 1329
  • [49] Optimal protein threading by cost-splitting
    Veber, P
    Yanev, N
    Andonov, R
    Poirriez, V
    ALLGORITHMS IN BIONIFORMATICS, PROCEEDINGS, 2005, 3692 : 365 - 375
  • [50] Optimal multigrid convergence by elliptic/hyperbolic splitting
    Nishikawa, H
    van Leer, B
    JOURNAL OF COMPUTATIONAL PHYSICS, 2003, 190 (01) : 52 - 63