Visualizing High Dimensional Datasets Using Parallel Coordinates: Application to Gene Prioritization

被引:0
|
作者
Boogaerts, Thomas [1 ]
Tranchevent, Leon-Charles [1 ]
Pavlopoulos, Georgios A. [1 ]
Aerts, Jan [1 ]
Vandewalle, Joos [1 ]
机构
[1] Katholieke Univ Leuven, ESAT SCD SISTA IBBT, KU Leuven Future Hlth Dept, B-3001 Louvain, Belgium
关键词
data visualization; parallel coordinates; genetic algorithm; gene prioritization;
D O I
暂无
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
In this paper, we introduce a visualization tool for interactive and efficient exploration of high dimensional data using parallel coordinates. An algorithm is developed to find an optimal permutation of dimensions, which allows the data miner to immediately see the most important features or irregularities in the dataset. This is implemented as a genetic algorithm based on the travelling salesman problem using maximal correlation as fitness. Other features of the tool include selection operators to group the data such as selection by intersection or by angle, orthogonal and density plots complementing the parallel coordinates plot, manual arrangement of permutation order of the dimensions, possibility to show all plots necessary to see all dimensional relations and displaying a certain number of standard deviations for each dimension separately. The tool is applied to multiple gene prioritization cases in search of genes that are relevant to certain genetic disorders. The used datasets are obtained with the MerKator and Endeavour tools and include a Breast cancer, Cataract, Charcoth-Marie-Tooth and Cardiomyopathy dataset, as well as a dataset relating 29 diseases with 22206 genes. Our tool, manual and data can be downloaded from http://www.toomas.be/parcoord/.
引用
收藏
页码:52 / 57
页数:6
相关论文
共 50 条
  • [31] Efficient Multiclass Classification Using Feature Selection in High-Dimensional Datasets
    Kumar, Ankur
    Kaur, Avinash
    Singh, Parminder
    Driss, Maha
    Boulila, Wadii
    ELECTRONICS, 2023, 12 (10)
  • [32] A Novel Approach to Classify High Dimensional Datasets Using Supervised Manifold Learning
    Mishra, Binod Kumar
    Saurabh, Praneet
    Verma, Bhupendra
    GLOBAL TRENDS IN INFORMATION SYSTEMS AND SOFTWARE APPLICATIONS, PT 2, 2012, 270 : 22 - 30
  • [33] Gene selection for high dimensional biological datasets using hybrid island binary artificial bee colony with chaos game optimization
    Nssibi, Maha
    Manita, Ghaith
    Chhabra, Amit
    Mirjalili, Seyedali
    Korbaa, Ouajdi
    ARTIFICIAL INTELLIGENCE REVIEW, 2024, 57 (03)
  • [34] Seasonality in Infection Predictions Using Interpretable Models for High Dimensional Imbalanced Datasets
    Canovas-Segura, Bernardo
    Morales, Antonio
    Juarez, Jose M.
    Campos, Manuel
    ARTIFICIAL INTELLIGENCE IN MEDICINE (AIME 2021), 2021, : 152 - 156
  • [35] Gene selection for high dimensional biological datasets using hybrid island binary artificial bee colony with chaos game optimization
    Maha Nssibi
    Ghaith Manita
    Amit Chhabra
    Seyedali Mirjalili
    Ouajdi Korbaa
    Artificial Intelligence Review, 57
  • [36] Depthgram: Visualizing outliers in high-dimensional functional data with application to fMRI data exploration
    Aleman-Gomez, Yasser
    Arribas-Gil, Ana
    Desco, Manuel
    Elias, Antonio
    Romo, Juan
    STATISTICS IN MEDICINE, 2022, 41 (11) : 2005 - 2024
  • [37] An efficient parallel row enumerated algorithm for mining frequent colossal closed itemsets from high dimensional datasets
    Vanahalli, Manjunath K.
    Patil, Nagamma
    INFORMATION SCIENCES, 2019, 496 : 343 - 362
  • [38] Visualizing High-Dimensional Structures by Dimension Ordering and Filtering using Subspace Analysis
    Ferdosi, Bilkis J.
    Roerdink, Jos B. T. M.
    COMPUTER GRAPHICS FORUM, 2011, 30 (03) : 1121 - 1130
  • [39] A statistical framework for modeling gene expression using chromatin features and application to modENCODE datasets
    Chao Cheng
    Koon-Kiu Yan
    Kevin Y Yip
    Joel Rozowsky
    Roger Alexander
    Chong Shou
    Mark Gerstein
    Genome Biology, 12
  • [40] A statistical framework for modeling gene expression using chromatin features and application to modENCODE datasets
    Cheng, Chao
    Yan, Koon-Kiu
    Yip, Kevin Y.
    Rozowsky, Joel
    Alexander, Roger
    Shou, Chong
    Gerstein, Mark
    GENOME BIOLOGY, 2011, 12 (02):