Discovery of Biomarker Genes from Earthworm Microarray Data by Discriminant Analysis and Clustering

被引:1
|
作者
Li, Ying [1 ]
Wang, Nan [1 ]
Zhang, Chaoyang [1 ]
Perkins, Edward J. [2 ]
Gong, Ping [3 ]
机构
[1] Univ So Mississippi, Hattiesburg, MS 39401 USA
[2] US Army Engn Res & Dev Ctr, Vicksburg, MS 39180 USA
[3] SpecPro Inc, Vicksburg, MS 39180 USA
关键词
Biomarker; Classification; Decision tree; Support vector machine; Clustering; Earthworm Microarray; CANCER CLASSIFICATION;
D O I
10.1109/IJCBS.2009.134
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Monitoring, assessment and prediction of environmental risks that chemicals pose demand rapid and accurate diagnostic assays. One important goal of microarray experiments is to discover novel biomarkers for toxicity evaluation. A variety of toxicological effects have been associated with explosive compounds 2,4,6-trinitrotoluene (TNT) and 1,3.5-trinitro-1,3,5-triazacyclohexane (RDX). Here we developed a discriminant analysis and cluster (DAC) pipeline to analyze a 248-array dataset with 15,208 non-redundant earthworm (Eisenia fetida) gene probes on each array. Our objective was to identify biomarker genes that can separate earthworm samples into three groups: control (untreated), TNT-treated, and RDX-treated. First, the class comparison statistical algorithm implemented in BRB-ArrayTools was used to infer a total of 869 genes that significantly changed relative to controls as a result of exposure to TNT or RDX at various concentrations for 4 or 14 days. Then, nine tree-based supervised machine learning algorithms were applied to generate classification rules and a set of 286 classifier genes. These classifier genes were ranked by their overall weight of significance in the nine classification methods, and were used to build support vector machines (SVM). A SVM containing all 286 classifier genes had the highest classification accuracy (91.5%). Results of unsupervised clustering show that the use of the top 100 classifier genes can assign the largest number of the 248 worm samples into the three reference clusters obtained by using all the 14,188 filtered genes, suggesting that these top-ranked genes may be potential candidates for biomarkers. This study demonstrates that the DAC pipeline can be used to identify a small set of biomarker genes from high dimensional datasets and generate a reliable SVM classification model for multiple classes.
引用
收藏
页码:23 / +
页数:2
相关论文
共 50 条
  • [41] Generalized MDS for data exploration, discriminant analysis, clustering and visualization
    Johannsen, DA
    Solka, J
    PROCEEDINGS OF THE 8TH JOINT CONFERENCE ON INFORMATION SCIENCES, VOLS 1-3, 2005, : 1739 - 1742
  • [42] Extraction of informative genes from microarray data
    Paul, Topon Kumar
    Iba, Hitoshi
    GECCO 2005: GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, VOLS 1 AND 2, 2005, : 453 - 460
  • [43] Identifying significant genes from microarray data
    Chuang, HY
    Liu, HF
    Brown, S
    McMunn-Coffran, C
    Kao, CY
    Hsu, DF
    BIBE 2004: FOURTH IEEE SYMPOSIUM ON BIOINFORMATICS AND BIOENGINEERING, PROCEEDINGS, 2004, : 358 - 365
  • [44] Binary state pattern clustering: A digital paradigm for class and biomarker discovery in gene microarray studies of cancer
    Beattie, Bradley J.
    Robinson, Peter N.
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2006, 13 (05) : 1114 - 1130
  • [45] Identification and Optimization of Classifier Genes from Multi-Class Earthworm Microarray Dataset
    Li, Ying
    Wang, Nan
    Perkins, Edward J.
    Zhang, Chaoyang
    Gong, Ping
    PLOS ONE, 2010, 5 (10):
  • [46] Clustering analysis of microarray gene expression data with new clustering ensemble method
    Luo, Fei
    Liu, Juan
    PROGRESS IN INTELLIGENCE COMPUTATION AND APPLICATIONS, PROCEEDINGS, 2007, : 500 - 504
  • [47] Biomarker Discovery based on BBHA and AdaboostM1 on Microarray Data for Cancer Classification
    Pashaei, Elnaz
    Ozen, Mustafa
    Aydin, Nizamettin
    2016 38TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2016, : 3080 - 3083
  • [48] Feature Genes Selection of Adult ALL Microarray Data with Affinity Propagation Clustering
    Chuang, Chen-Chia
    Li, Yan-Cheng
    Jeng, Jin-Tsong
    Chang, Chih-Kai
    Wang, Zhi-Qian
    2015 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS - TAIWAN (ICCE-TW), 2015, : 230 - 231
  • [49] Variable selection for discriminant analysis with Markov random field priors for the analysis of microarray data
    Stingo, Francesco C.
    Vannucci, Marina
    BIOINFORMATICS, 2011, 27 (04) : 495 - 501
  • [50] Bioinformatics analysis of microarray data to identify hub genes, as diagnostic biomarker of HELLP syndrome: System biology approach
    Asadikalameh, Zahra
    Maddah, Reza
    Maleknia, Mohsen
    Nassaj, Zohre S.
    Ali, Neda Seyed
    Azizi, Sepideh
    Dastyar, Fatemeh
    JOURNAL OF OBSTETRICS AND GYNAECOLOGY RESEARCH, 2022, 48 (10) : 2493 - 2504