Length Bias Correction in Gene Ontology Enrichment Analysis Using Logistic Regression

被引:23
|
作者
Mi, Gu [1 ]
Di, Yanming [1 ,2 ]
Emerson, Sarah [1 ]
Cumbie, Jason S. [2 ,3 ]
Chang, Jeff H. [2 ,3 ,4 ]
机构
[1] Oregon State Univ, Dept Stat, Corvallis, OR 97331 USA
[2] Oregon State Univ, Mol & Cellular Biol Program, Corvallis, OR 97331 USA
[3] Oregon State Univ, Dept Bot & Plant Pathol, Corvallis, OR 97331 USA
[4] Oregon State Univ, Ctr Genome Res & Biocomp, Corvallis, OR 97331 USA
来源
PLOS ONE | 2012年 / 7卷 / 10期
基金
美国国家卫生研究院; 美国食品与农业研究所;
关键词
DIFFERENTIAL EXPRESSION ANALYSIS; RNA-SEQ DATA; TOOLS; GRAPH;
D O I
10.1371/journal.pone.0046128
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
When assessing differential gene expression from RNA sequencing data, commonly used statistical tests tend to have greater power to detect differential expression of genes encoding longer transcripts. This phenomenon, called "length bias", will influence subsequent analyses such as Gene Ontology enrichment analysis. In the presence of length bias, Gene Ontology categories that include longer genes are more likely to be identified as enriched. These categories, however, are not necessarily biologically more relevant. We show that one can effectively adjust for length bias in Gene Ontology analysis by including transcript length as a covariate in a logistic regression model. The logistic regression model makes the statistical issue underlying length bias more transparent: transcript length becomes a confounding factor when it correlates with both the Gene Ontology membership and the significance of the differential expression test. The inclusion of the transcript length as a covariate allows one to investigate the direct correlation between the Gene Ontology membership and the significance of testing differential expression, conditional on the transcript length. We present both real and simulated data examples to show that the logistic regression approach is simple, effective, and flexible.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] Sentiment Analysis Using Multinomial Logistic Regression
    Ramadhan, W. P.
    Novianty, Astri
    Setianingsih, Casi
    2017 INTERNATIONAL CONFERENCE ON CONTROL, ELECTRONICS, RENEWABLE ENERGY AND COMMUNICATIONS (ICCREC), 2017, : 46 - 49
  • [22] Characteristics of Cyclist Crashes Using Polytomous Latent Class Analysis and Bias-Reduced Logistic Regression
    Sekiguchi, Yuta
    Tanishita, Masayoshi
    Sunaga, Daisuke
    SUSTAINABILITY, 2022, 14 (09)
  • [23] DIF detection using logistic discriminant analysis and polytomous logistic regression
    Hidalgo, MD
    Gómez-Benito, J
    Padilla, JL
    PSICOTHEMA, 2000, 12 : 298 - 300
  • [24] Correction to: Feedback through emotion extraction using logistic regression and CNN
    Mohit Ranjan Panda
    Sarthak Saurav Kar
    Aakash Kumar Nanda
    Rojalina Priyadarshini
    Susmita Panda
    Sukant Kishoro Bisoy
    The Visual Computer, 2022, 38 : 1989 - 1989
  • [25] Structured Penalized Logistic Regression for Gene Selection in Gene Expression Data Analysis
    Liu, Cheng
    Wong, Hau San
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2019, 16 (01) : 312 - 321
  • [26] Reconstruction, Topological and Gene Ontology Enrichment Analysis of Cancerous Gene Regulatory Network Modules
    Raza, Khalid
    CURRENT BIOINFORMATICS, 2016, 11 (02) : 243 - 258
  • [27] An Improved Lexicon using Logistic Regression for Sentiment Analysis
    Bhargava, Kunal
    Katarya, Rahul
    2017 INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION TECHNOLOGIES FOR SMART NATION (IC3TSN), 2017, : 332 - 337
  • [28] Binary Response Analysis Using Logistic Regression in Dentistry
    Srimaneekarn, Natchalee
    Hayter, Anthony
    Liu, Wei
    Tantipoj, Chanita
    INTERNATIONAL JOURNAL OF DENTISTRY, 2022, 2022
  • [29] EEG Signal Analysis Using PCA and Logistic Regression
    Soeiro, Celine F. C.
    XXVI BRAZILIAN CONGRESS ON BIOMEDICAL ENGINEERING, CBEB 2018, VOL. 2, 2019, 70 (02): : 175 - 180
  • [30] FACTORS AFFECTING TUBERCULOSIS ANALYSIS USING LOGISTIC REGRESSION
    Matrood, Hadiya H.
    Talib, Hayder R.
    Ahmed, Ali H.
    Hussein, Mandi A.
    INTERNATIONAL JOURNAL OF AGRICULTURAL AND STATISTICAL SCIENCES, 2021, 17 : 1349 - 1356