A simulation study of the strength of evidence in the recommendation of medications based on two trials with statistically significant results

被引:15
|
作者
van Ravenzwaaij, Don [1 ]
Ioannidis, John P. A. [2 ,3 ,4 ,5 ]
机构
[1] Univ Groningen, Dept Psychol, Groningen, Netherlands
[2] Stanford Univ, Dept Med, Stanford, CA 94305 USA
[3] Stanford Univ, Dept Hlth Res & Policy, Stanford, CA 94305 USA
[4] Stanford Univ, Dept Stat, Stanford, CA 94305 USA
[5] Stanford Univ, Meta Res Innovat Ctr Stanford METRICS, Stanford, CA 94305 USA
来源
PLOS ONE | 2017年 / 12卷 / 03期
关键词
P-VALUES; CONFIDENCE-INTERVALS; RANDOMIZED-TRIALS; CLINICAL-TRIALS; HYPOTHESIS; TESTS;
D O I
10.1371/journal.pone.0173184
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
A typical rule that has been used for the endorsement of new medications by the Food and Drug Administration is to have two trials, each convincing on its own, demonstrating effectiveness. "Convincing" may be subjectively interpreted, but the use of p-values and the focus on statistical significance (in particular with p < .05 being coined significant) is pervasive in clinical research. Therefore, in this paper, we calculate with simulations what it means to have exactly two trials, each with p < .05, in terms of the actual strength of evidence quantified by Bayes factors. Our results show that different cases where two trials have a p-value below .05 have wildly differing Bayes factors. Bayes factors of at least 20 in favor of the alternative hypothesis are not necessarily achieved and they fail to be reached in a large proportion of cases, in particular when the true effect size is small (0.2 standard deviations) or zero. In a non-trivial number of cases, evidence actually points to the null hypothesis, in particular when the true effect size is zero, when the number of trials is large, and when the number of participants in both groups is low. We recommend use of Bayes factors as a routine tool to assess endorsement of new medications, because Bayes factors consistently quantify strength of evidence. Use of p-values may lead to paradoxical and spurious decision-making regarding the use of new medications.
引用
收藏
页数:16
相关论文
共 50 条
  • [31] "Spin" in wound care research: the reporting and interpretation of randomized controlled trials with statistically non-significant primary outcome results or unspecified primary outcomes
    Lockyer, Suzanne
    Hodgson, Rob
    Dumville, Jo C.
    Cullum, Nicky
    TRIALS, 2013, 14
  • [32] Dichotomising outcome variables in clinical trials: Results of a simulation study.
    Shepstone, L
    ARTHRITIS AND RHEUMATISM, 2002, 46 (09): : S114 - S114
  • [33] Evidence-based medicine and treatment of hypertension in women: results of trials
    Hayes, SN
    JOURNAL OF HYPERTENSION, 2002, 20 : S47 - S51
  • [34] Evidence-based clinical guidelines: a new system to better determine true strength of recommendation
    Roddy, E
    Zhang, WY
    Doherty, M
    Arden, NK
    Barlow, J
    Birrell, F
    Carr, A
    Chakravarty, K
    Dickson, J
    Hay, E
    Hosie, G
    Hurley, M
    Jordan, KM
    McCarthy, C
    McMurdo, M
    Mockett, S
    O'Reilly, S
    Peat, G
    Pendleton, A
    Richards, S
    JOURNAL OF EVALUATION IN CLINICAL PRACTICE, 2006, 12 (03) : 347 - 352
  • [35] Presentation of automated procedural guidance in surgical simulation: results of two randomised controlled trials
    Wijewickrema, S.
    Zhou, Y.
    Ioannou, I.
    Copson, B.
    Piromchai, P.
    Yu, C.
    Briggs, R.
    Bailey, J.
    Kennedy, G.
    O'Leary, S.
    JOURNAL OF LARYNGOLOGY AND OTOLOGY, 2018, 132 (03): : 257 - 263
  • [36] Assessment of spin in the abstracts of randomized controlled trials in dental caries with statistically nonsignificant results for primary outcomes: A methodological study
    Su, Naichuan
    van der Linden, Michiel W.
    Faggion, Clovis M.
    van der Heijden, Geert J. M. G.
    CARIES RESEARCH, 2023, 57 (5-6) : 553 - 562
  • [37] Pre-emption dimensional study for obtaining statistically significant results for the variation of γglutamyl-transferase during ovarian stimulation
    Tica, Vlad I.
    Mares, Pierre
    Teren, Ovidiu
    Tica, Irina
    Tica, Andrei A.
    JOURNAL OF GASTROINTESTINAL AND LIVER DISEASES, 2007, 16 (01) : 53 - 55
  • [38] The Continuous Fragility Index of Statistically Significant Findings in Studies Based on High Levels of Evidence Comparing Interventions for Femoroacetabular Impingement Syndrome
    Villarreal-Espinosa, Juan Bernardo
    Khan, Zeeshan A.
    Jan, Kyleen
    Berreta, Rodrigo Saad
    Murray, Michael J.
    Allende, Felicitas
    Nho, Shane J.
    Chahla, Jorge
    AMERICAN JOURNAL OF SPORTS MEDICINE, 2025,
  • [39] Results of the Laparoscopic Colon Cancer Randomized Trials: An Evidence-Based Review
    Martel, Guillaume
    Boushey, Robin P.
    Marcello, Peter W.
    SEMINARS IN COLON AND RECTAL SURGERY, 2007, 18 (04) : 210 - 219
  • [40] EVIDENCE BASED PURCHASING - UNDERSTANDING RESULTS OF CLINICAL-TRIALS AND SYSTEMATIC REVIEWS
    FAHEY, T
    GRIFFITHS, S
    PETERS, TJ
    BRITISH MEDICAL JOURNAL, 1995, 311 (7012): : 1056 - 1059