Effects of influential points and sample size on the selection and replicability of multivariable fractional polynomial models

被引:1
|
作者
Sauerbrei, Willi [1 ]
Kipruto, Edwin [1 ]
Balmford, James [1 ]
机构
[1] Univ Freiburg, Inst Med Biometry & Stat, Fac Med, Freiburg, Germany
关键词
Continuous variable; Fractional polynomial; Influential point; Model building; Sample size; Simulated data; CONTINUOUS PREDICTORS; REGRESSION; TRANSFORMATION; STABILITY; VARIABLES; SPLINES;
D O I
10.1186/s41512-023-00145-1
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Background The multivariable fractional polynomial (MFP) approach combines variable selection using backward elimination with a function selection procedure (FSP) for fractional polynomial (FP) functions. It is a relatively simple approach which can be easily understood without advanced training in statistical modeling. For continuous variables, a closed test procedure is used to decide between no effect, linear, FP1, or FP2 functions. Influential points (IPs) and small sample sizes can both have a strong impact on a selected function and MFP model.Methods We used simulated data with six continuous and four categorical predictors to illustrate approaches which can help to identify IPs with an influence on function selection and the MFP model. Approaches use leave-one or two-out and two related techniques for a multivariable assessment. In eight subsamples, we also investigated the effects of sample size and model replicability, the latter by using three non-overlapping subsamples with the same sample size. For better illustration, a structured profile was used to provide an overview of all analyses conducted.Results The results showed that one or more IPs can drive the functions and models selected. In addition, with a small sample size, MFP was not able to detect some non-linear functions and the selected model differed substantially from the true underlying model. However, when the sample size was relatively large and regression diagnostics were carefully conducted, MFP selected functions or models that were similar to the underlying true model.Conclusions For smaller sample size, IPs and low power are important reasons that the MFP approach may not be able to identify underlying functional relationships for continuous variables and selected models might differ substantially from the true model. However, for larger sample sizes, a carefully conducted MFP analysis is often a suitable way to select a multivariable regression model which includes continuous variables. In such a case, MFP can be the preferred approach to derive a multivariable descriptive model.
引用
收藏
页数:17
相关论文
共 50 条
  • [21] Normative models for neuroimaging markers: Impact of model selection, sample size and evaluation criteria
    Bozek, Jelena
    Griffanti, Ludovica
    Lau, Stephan
    Jenkinson, Mark
    NEUROIMAGE, 2023, 268
  • [22] Efficient inference for nonlinear state space models: An automatic sample size selection rule
    Cheng, Jing
    Chan, Ngai Hang
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2019, 138 : 143 - 154
  • [23] On the marginal effects of variables in the log-transformed sample selection models
    Yen, Steven T.
    Rosinski, Jan
    ECONOMICS LETTERS, 2008, 100 (01) : 4 - 8
  • [24] Effects of Sample Plot Size and Prediction Models on Diameter Distribution Recovery
    Bankston, Josh B.
    Sabatia, Charles O.
    Poudel, Krishna P.
    FOREST SCIENCE, 2021, 67 (03) : 245 - 255
  • [25] A new proposal for multivariable modelling of time-varying effects in survival data based on fractional polynomial time-transformation
    Sauerbrei, Willi
    Royston, Patrick
    Look, Maxime
    BIOMETRICAL JOURNAL, 2007, 49 (03) : 453 - 473
  • [26] Release from active learning/model selection dilemma: Optimizing sample points and models at the same time
    Sugiyama, M
    Ogawa, H
    PROCEEDING OF THE 2002 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-3, 2002, : 2917 - 2922
  • [27] Asymptotic bias reduction for a conditional marginal effects estimator in sample selection models
    Akay, Alpaslan
    Tsakas, Elias
    APPLIED ECONOMICS, 2008, 40 (24) : 3101 - 3110
  • [28] ENDOGENOUS TREATMENT EFFECTS FOR COUNT DATA MODELS WITH ENDOGENOUS PARTICIPATION OR SAMPLE SELECTION
    Bratti, Massimiliano
    Miranda, Alfonso
    HEALTH ECONOMICS, 2011, 20 (09) : 1090 - 1109
  • [29] Effects of study duration, frequency of observation, and sample size on power in studies of group differences in polynomial change
    Raudenbush, SW
    Liu, XF
    PSYCHOLOGICAL METHODS, 2001, 6 (04) : 387 - 401
  • [30] Effects of Sample Size on Estimates of Population Growth Rates Calculated with Matrix Models
    Fiske, Ian J.
    Bruna, Emilio M.
    Bolker, Benjamin M.
    PLOS ONE, 2008, 3 (08):