Enrichment or depletion of a GO category within a class of genes: which test?

被引:494
作者
Rivals, Isabelle
Personnaz, Leon
Taing, Lieng
Potier, Marie-Claude
机构
[1] Ecole Super Phys & Chim Ind Ville Paris, Equipe Stat Appl, F-75005 Paris, France
[2] Ecole Super Phys & Chim Ind Ville Paris, Lab Neurobiol & Divers Cellulaire, F-75005 Paris, France
关键词
D O I
10.1093/bioinformatics/btl633
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: A number of available program packages determine the significant enrichments and/or depletions of GO categories among a class of genes of interest. Whereas a correct formulation of the problem leads to a single exact null distribution, these GO tools use a large variety of statistical tests whose denominations often do not clarify the underlying P-value computations. Summary: We review the different formulations of the problem and the tests they lead to: the binomial, chi(2), equality of two probabilities, Fisher's exact and hypergeometric tests. We clarify the relationships existing between these tests, in particular the equivalence between the hypergeometric test and Fisher's exact test. We recall that the other tests are valid only for large samples, the test of equality of two probabilities and the chi(2)-test being equivalent. We discuss the appropriateness of one- and two-sided P-values, as well as some discreteness and conservatism issues. Contact:isabelle.rivals@espci.fr Supplementary information: Supplementary data are available at Bioinformatics online.
引用
收藏
页码:401 / 407
页数:7
相关论文
共 34 条
[1]   On small-sample confidence intervals for parameters in discrete distributions [J].
Agresti, A ;
Min, YY .
BIOMETRICS, 2001, 57 (03) :963-971
[2]  
AGRESTI A, 2006, COMP 2006 17 S IASC
[3]   FatiGO:: a web tool for finding significant associations of Gene Ontology terms with groups of genes [J].
Al-Shahrour, F ;
Díaz-Uriarte, R ;
Dopazo, J .
BIOINFORMATICS, 2004, 20 (04) :578-580
[4]  
[Anonymous], 2011, Categorical data analysis
[5]  
[Anonymous], 1992, Statistical Science, DOI DOI 10.1214/SS/1177011454
[6]  
[Anonymous], 1974, Introduction to the Theory of Statistics
[7]   GOstat: find statistically overrepresented Gene Ontologies within a group of genes [J].
Beissbarth, T ;
Speed, TP .
BIOINFORMATICS, 2004, 20 (09) :1464-1465
[8]   GO::TermFinder - open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes [J].
Boyle, EI ;
Weng, SA ;
Gollub, J ;
Jin, H ;
Botstein, D ;
Cherry, JM ;
Sherlock, G .
BIOINFORMATICS, 2004, 20 (18) :3710-3715
[9]   GeneMerge - post-genomic analysis, data mining, and hypothesis testing [J].
Castillo-Davis, CI ;
Hartl, DL .
BIOINFORMATICS, 2003, 19 (07) :891-892
[10]   NetAffx gene ontology mining tool: A visual approach for microarray data analysis [J].
Cheng, J ;
Sun, S ;
Tracy, A ;
Hubbell, E ;
Morris, J ;
Valmeekam, V ;
Kimbrough, A ;
Cline, MS ;
Liu, GY ;
Shigeta, R ;
Kulp, D ;
Siani-Rose, MA .
BIOINFORMATICS, 2004, 20 (09) :1462-1463