Sequence count data are poorly fit by the negative binomial distribution

被引:25
|
作者
Hawinkel, Stijn [1 ]
Rayner, J. C. W. [2 ,5 ]
Bijnens, Luc [3 ,4 ]
Thas, Olivier [1 ,4 ,5 ]
机构
[1] Univ Ghent, Dept Data Anal & Math Modelling, Ghent, Belgium
[2] Univ Newcastle, Ctr Comp Assisted Res Math & Its Applicat, Sch Math & Phys Sci, Newcastle, NSW, Australia
[3] Janssen Pharmaceut Co Johnson & Johnson, Quantitat Sci, Ghent, Belgium
[4] Hasselt Univ, I BioStat, Hasselt, Belgium
[5] Univ Wollongong, Natl Inst Appl Stat Res Australia NIASRA, Wollongong, NSW, Australia
来源
PLOS ONE | 2020年 / 15卷 / 04期
关键词
GOODNESS-OF-FIT; RNA-SEQ DATA; MODELS;
D O I
10.1371/journal.pone.0224909
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Sequence count data are commonly modelled using the negative binomial (NB) distribution. Several empirical studies, however, have demonstrated that methods based on the NB-assumption do not always succeed in controlling the false discovery rate (FDR) at its nominal level. In this paper, we propose a dedicated statistical goodness of fit test for the NB distribution in regression models and demonstrate that the NB-assumption is violated in many publicly available RNA-Seq and 16S rRNA microbiome datasets. The zero-inflated NB distribution was not found to give a substantially better fit. We also show that the NB-based tests perform worse on the features for which the NB-assumption was violated than on the features for which no significant deviation was detected. This gives an explanation for the poor behaviour of NB-based tests in many published evaluation studies. We conclude that non-parametric tests should be preferred over parametric methods.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Analysis of frequency count data using the negative binomial distribution
    White, GC
    Bennetts, RE
    ECOLOGY, 1996, 77 (08) : 2549 - 2557
  • [2] Regression models for count data based on the negative binomial(p) distribution
    Hardin, James W.
    Hilbe, Joseph M.
    STATA JOURNAL, 2014, 14 (02): : 280 - 291
  • [3] Using the negative binomial distribution to model overdispersion in ecological count data
    Linden, Andreas
    Mantyniemi, Samu
    ECOLOGY, 2011, 92 (07) : 1414 - 1421
  • [4] Applying Negative Binomial Distribution in Diagnostic Classification Models for Analyzing Count Data
    Liu, Ren
    Heo, Ihnwhi
    Liu, Haiyan
    Shi, Dexin
    Jiang, Zhehan
    APPLIED PSYCHOLOGICAL MEASUREMENT, 2022, : 64 - 75
  • [5] Bayesian negative binomial mixture regression models for the analysis of sequence count and methylation data
    Li, Qiwei
    Cassese, Alberto
    Guindani, Michele
    Vannucci, Marina
    BIOMETRICS, 2019, 75 (01) : 183 - 192
  • [6] Anscombe's Tests of Fit for the Negative Binomial Distribution
    Best, D. J.
    Rayner, J. C. W.
    Thas, O.
    JOURNAL OF STATISTICAL THEORY AND PRACTICE, 2009, 3 (03) : 555 - 565
  • [8] Functional forms for the negative binomial model for count data
    Greene, William
    ECONOMICS LETTERS, 2008, 99 (03) : 585 - 590
  • [9] Goodness-of-Fit Test for the Bivariate Negative Binomial Distribution
    Novoa-Munoz, Francisco
    Aguirre-Gonzalez, Juan Pablo
    AXIOMS, 2025, 14 (01)
  • [10] Negative Binomial-Reciprocal Inverse Gaussian Distribution: Statistical Properties with Applications in Count Data
    Hassan, Anwar
    Shah, Ishfaq
    Peer, Bilal
    THAILAND STATISTICIAN, 2021, 19 (03): : 437 - 449