Computational method for discovery of biomarker signatures from large, complex data sets

被引:2
|
作者
Makarov, Vladimir [1 ,2 ]
Gorlin, Alex [2 ]
机构
[1] Calif State Univ Channel Isl, Camarillo, CA 93012 USA
[2] IFXworks LLC, 2915 Columbia Pike, Arlingtion, VA 22204 USA
关键词
Biomarker; Microarray; Gene expression; Chemical; Classification; TRANSLATIONAL BIOINFORMATICS; SELECTION; CLASSIFICATION;
D O I
10.1016/j.compbiolchem.2018.07.008
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
We present an efficient method for identifying of reliable biomarker panels from large multivariate data sets that typically result from experiments that monitor changes in RNA, small molecule, or protein abundance. Our computational methodology is developed and validated on the toxicogenomics database Drug Matrix that in its largest category contains 1656 recognition targets, characterized by the toxicant, dose and time (or duration) of the exposure. We were able to recognize both individual experimental conditions (compound, dose and time combinations) and the cases where the values for dose and time variables fall within the intervals in the training data, but do not match the training data exactly. Inclusion of gene expression information for multiple organs improved accuracy of recognition. Inclusion of time response information into consideration allowed us to develop particularly accurate marker panels for a large number of targets: we were able to recognize 176 compounds (out of 316) at greater than 90% accuracy. The presented methodology has an immediate application for discovery of diagnostic biomarker panels for exposure to various toxicity hazards, and may also be useful for development of biological markers for medical applications.
引用
收藏
页码:161 / 168
页数:8
相关论文
共 50 条
  • [21] A MCMC Method for Bayesian System Identification from Large Data Sets
    Green, P. L.
    MODEL VALIDATION AND UNCERTAINTY QUANTIFICATION, VOL 3, 2015, : 275 - 281
  • [22] Computational techniques for spatial logistic regression with large data sets
    Paciorek, Christopher J.
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2007, 51 (08) : 3631 - 3653
  • [23] A parallel, distributed algorithm for relational frequent pattern discovery from very large data sets
    Appice, Annalisa
    Ceci, Michelangelo
    Turi, Antonio
    Malerba, Donato
    INTELLIGENT DATA ANALYSIS, 2011, 15 (01) : 69 - 88
  • [24] Measuring Similarity of Complex and Heterogeneous Data in Clustering of Large Data Sets
    Bacelar-Nicolau, Helena
    Nicolau, Fernando
    Sousa, Aurga
    Bacelar-Nicolau, Leonor
    BIOCYBERNETICS AND BIOMEDICAL ENGINEERING, 2009, 29 (02) : 9 - 18
  • [25] Knowledge discovery for large data sets using artificial neural network
    Shobha, Gangadhara T.
    Sharma, Sreenivasa C.
    Doreswamy
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2005, 1 (04): : 635 - 642
  • [26] New breast cancer genes - Discovery at the intersection of complex data sets
    Nevins, Joseph R.
    CANCER CELL, 2007, 12 (06) : 497 - 499
  • [27] Biomarker Signature Discovery from Mass Spectrometry Data
    Kong, Ao
    Gupta, Chinmaya
    Ferrari, Mauro
    Agostini, Marco
    Bedin, Chiara
    Bouamrani, Ali
    Tasciotti, Ennio
    Azencott, Robert
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2014, 11 (04) : 766 - 772
  • [28] From visualisation to data mining with large data sets
    Adelmann, A
    Ryne, RD
    Shalf, JM
    Siegerist, C
    2005 IEEE PARTICLE ACCELERATOR CONFERENCE (PAC), VOLS 1-4, 2005, : 542 - 544
  • [29] On fuzzy-knowledge discovery from data sets
    Maddouri, M
    PROCEEDINGS OF THE FIFTH JOINT CONFERENCE ON INFORMATION SCIENCES, VOLS 1 AND 2, 2000, : 272 - 275
  • [30] Discovery and validation of multidimensional biomarker sets in a large single site rheumatoid arthritis patient registry
    Shadick, N
    Weinblatt, ME
    Maher, NE
    Solomon, DS
    Coblyn, JC
    Anderson, RJ
    Meyer, J
    Parker, A
    Chun, M
    Fedyk, E
    Singh, A
    Bryce, J
    Ginsburg, G
    Lekstrom-Himes, J
    ARTHRITIS AND RHEUMATISM, 2004, 50 (09): : S162 - S163