Background: Using suitable error models for gene expression measurements is essential in the statistical analysis of microarray data. However, the true probabilistic model underlying gene expression intensity readings is generally not known. Instead, in currently used approaches some simple parametric model is assumed (usually a transformed normal distribution) or the empirical distribution is estimated. However, both these strategies may not be optimal for gene expression data, as the non-parametric approach ignores known structural information whereas the fully parametric models run the risk of misspecification. A further related problem is the choice of a suitable scale for the model (e. g. observed vs. log-scale). Results: Here a simple semi-parametric model for gene expression measurement error is presented. In this approach inference is based an approximate likelihood function (the extended quasi-likelihood). Only partial knowledge about the unknown true distribution is required to construct this function. In case of gene expression this information is available in the form of the postulated (e.g. quadratic) variance structure of the data. As the quasi-likelihood behaves (almost) like a proper likelihood, it allows for the estimation of calibration and variance parameters, and it is also straightforward to obtain corresponding approximate confidence intervals. Unlike most other frameworks, it also allows analysis on any preferred scale, i.e. both on the original linear scale as well as on a transformed scale. It can also be employed in regression approaches to model systematic (e.g. array or dye) effects. Conclusions: The quasi-likelihood framework provides a simple and versatile approach to analyze gene expression data that does not make any strong distributional assumptions about the underlying error model. For several simulated as well as real data sets it provides a better fit to the data than competing models. In an example it also improved the power of tests to identify differential expression.
机构:
Univ Penn, Perelman Sch Med, Dept Biostat Epidemiol & Informat, Philadelphia, PA 19104 USAUniv Penn, Perelman Sch Med, Dept Biostat Epidemiol & Informat, Philadelphia, PA 19104 USA
Boe, Lillian A.
Tinker, Lesley F.
论文数: 0引用数: 0
h-index: 0
机构:
Fred Hutchinson Canc Res Ctr, Div Publ Hlth Sci, 1124 Columbia St, Seattle, WA 98104 USAUniv Penn, Perelman Sch Med, Dept Biostat Epidemiol & Informat, Philadelphia, PA 19104 USA
Tinker, Lesley F.
Shaw, Pamela A.
论文数: 0引用数: 0
h-index: 0
机构:
Univ Penn, Perelman Sch Med, Dept Biostat Epidemiol & Informat, Philadelphia, PA 19104 USAUniv Penn, Perelman Sch Med, Dept Biostat Epidemiol & Informat, Philadelphia, PA 19104 USA
机构:
Chinese Univ Hong Kong, Sch Biomed Sci, Shatin, Hong Kong, Peoples R ChinaTemple Univ Hlth Syst, Dept Biostat & Bioinformat, Fox Chase Canc Ctr, Philadelphia, PA 19111 USA
机构:
Natl Taras Shevchenko Univ, Fac Mech & Math, Dept Probabil Theory Stat & Actuarial Math, Volodymyrska St 64, UA-01601 Kiev, UkraineNatl Taras Shevchenko Univ, Fac Mech & Math, Dept Probabil Theory Stat & Actuarial Math, Volodymyrska St 64, UA-01601 Kiev, Ukraine