共 50 条
Analyzing clustered count data with a cluster-specific random effect zero-inflated Conway-Maxwell-Poisson distribution
被引:12
|作者:
Choo-Wosoba, Hyoyoung
[1
]
Datta, Somnath
[2
]
机构:
[1] Univ Louisville, Dept Bioinformat & Biostat, Louisville, KY 40202 USA
[2] Univ Florida, Dept Biostat, Gainesville, FL USA
基金:
美国国家卫生研究院;
关键词:
Gaussian-Hermite (G-H) quadrature;
mixed effects model;
next-generation sequencing (NGS) data;
Poisson distribution;
under- and over-dispersions;
REGRESSION;
MODEL;
D O I:
10.1080/02664763.2017.1312299
中图分类号:
O21 [概率论与数理统计];
C8 [统计学];
学科分类号:
020208 ;
070103 ;
0714 ;
摘要:
Count data analysis techniques have been developed in biological and medical research areas. In particular, zero-inflated versions of parametric count distributions have been used to model excessive zeros that are often present in these assays. The most common count distributions for analyzing such data are Poisson and negative binomial. However, a Poisson distribution can only handle equidispersed data and a negative binomial distribution can only cope with overdispersion. However, a Conway-Maxwell-Poisson (CMP) distribution [4] can handle a wide range of dispersion. We show, with an illustrative data set on next-generation sequencing of maize hybrids, that both underdispersion and overdispersion can be present in genomic data. Furthermore, the maize data set consists of clustered observations and, therefore, we develop inference procedures for a zero-inflated CMP regression that incorporates a cluster-specific random effect term. Unlike the Gaussian models, the underlying likelihood is computationally challenging. We use a numerical approximation via a Gaussian quadrature to circumvent this issue. A test for checking zero-inflation has also been developed in our setting. Finite sample properties of our estimators and test have been investigated by extensive simulations. Finally, the statistical methodology has been applied to analyze the maize data mentioned before.
引用
收藏
页码:799 / 814
页数:16
相关论文