A simulation study of the strength of evidence in the recommendation of medications based on two trials with statistically significant results

被引:15
|
作者
van Ravenzwaaij, Don [1 ]
Ioannidis, John P. A. [2 ,3 ,4 ,5 ]
机构
[1] Univ Groningen, Dept Psychol, Groningen, Netherlands
[2] Stanford Univ, Dept Med, Stanford, CA 94305 USA
[3] Stanford Univ, Dept Hlth Res & Policy, Stanford, CA 94305 USA
[4] Stanford Univ, Dept Stat, Stanford, CA 94305 USA
[5] Stanford Univ, Meta Res Innovat Ctr Stanford METRICS, Stanford, CA 94305 USA
来源
PLOS ONE | 2017年 / 12卷 / 03期
关键词
P-VALUES; CONFIDENCE-INTERVALS; RANDOMIZED-TRIALS; CLINICAL-TRIALS; HYPOTHESIS; TESTS;
D O I
10.1371/journal.pone.0173184
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
A typical rule that has been used for the endorsement of new medications by the Food and Drug Administration is to have two trials, each convincing on its own, demonstrating effectiveness. "Convincing" may be subjectively interpreted, but the use of p-values and the focus on statistical significance (in particular with p < .05 being coined significant) is pervasive in clinical research. Therefore, in this paper, we calculate with simulations what it means to have exactly two trials, each with p < .05, in terms of the actual strength of evidence quantified by Bayes factors. Our results show that different cases where two trials have a p-value below .05 have wildly differing Bayes factors. Bayes factors of at least 20 in favor of the alternative hypothesis are not necessarily achieved and they fail to be reached in a large proportion of cases, in particular when the true effect size is small (0.2 standard deviations) or zero. In a non-trivial number of cases, evidence actually points to the null hypothesis, in particular when the true effect size is zero, when the number of trials is large, and when the number of participants in both groups is low. We recommend use of Bayes factors as a routine tool to assess endorsement of new medications, because Bayes factors consistently quantify strength of evidence. Use of p-values may lead to paradoxical and spurious decision-making regarding the use of new medications.
引用
收藏
页数:16
相关论文
共 50 条
  • [21] Assessment of the strength of recommendation and quality of evidence: GRADE checklist. A descriptive study
    Bezerra, Camila Torres
    Grande, Antonio Jose
    Galvao, Vivianny Kelly
    Marin dos Santos, Douglas Henrique
    Atallah, Alvaro Nagib
    Silva, Valter
    SAO PAULO MEDICAL JOURNAL, 2022, 140 (06): : 829 - 836
  • [22] Calculation of limits for significant bidirectional changes in two or more serial results of a biomarker based on a computer simulation model
    Lund, Flemming
    Petersen, Per Hyltoft
    Fraser, Callum G.
    Soletormos, Gyorgy
    ANNALS OF CLINICAL BIOCHEMISTRY, 2015, 52 (04) : 434 - 440
  • [23] Calculation of limits for significant unidirectional changes in two or more serial results of a biomarker based on a computer simulation model
    Lund, Flemming
    Petersen, Per Hyltoft
    Fraser, Callum G.
    Soletormos, Gyorgy
    ANNALS OF CLINICAL BIOCHEMISTRY, 2015, 52 (02) : 237 - 244
  • [24] Strength of Evidence of Noninferiority Trials with the Two Confidence Interval Method with Random Margin
    Wang, So-Young
    Kang, Seung-Ho
    JOURNAL OF BIOPHARMACEUTICAL STATISTICS, 2013, 23 (02) : 307 - 321
  • [25] Enhanced personalized recommendation system for machine learning public datasets: generalized modeling, simulation, significant results and analysis
    Bhaskaran S.
    Marappan R.
    International Journal of Information Technology, 2023, 15 (3) : 1583 - 1595
  • [26] Statistically significant differences versus convincing evidence of real treatment effects: an analysis of the false positive risk for single- centre trials in anaesthesia
    Sidebotham, David
    Dominick, Felicity
    Deng, Carolyn
    Barlow, Jake
    Jones, Philip M.
    BRITISH JOURNAL OF ANAESTHESIA, 2024, 132 (01) : 116 - 123
  • [27] A Meta-Epidemiological Study of Positive Results in Clinical Nutrition Research: The Good, the Bad and the Ugly of Statistically Significant Findings
    Gkiouras, Konstantinos
    Choleva, Maria-Eleftheria
    Verrou, Aikaterini
    Goulis, Dimitrios G.
    Bogdanos, Dimitrios P.
    Grammatikopoulou, Maria G.
    NUTRIENTS, 2022, 14 (23)
  • [28] ELM-Based Large-Scale Genetic Association Study via Statistically Significant Pattern
    Li, Yuan
    Zhao, Yuhai
    Wang, Guoren
    Wang, Zhanghui
    Gao, Min
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2019, 49 (10): : 2175 - 2188
  • [29] Do industry-sponsored randomized controlled drug trials done in nursing homes (NH) have more statistically significant results and better quality?
    Cheng, H.
    JOURNAL OF THE AMERICAN GERIATRICS SOCIETY, 2007, 55 (04) : S35 - S35
  • [30] “Spin” in wound care research: the reporting and interpretation of randomized controlled trials with statistically non-significant primary outcome results or unspecified primary outcomes
    Suzanne Lockyer
    Rob Hodgson
    Jo C Dumville
    Nicky Cullum
    Trials, 14