Visualizing High Dimensional Datasets Using Parallel Coordinates: Application to Gene Prioritization

被引:0
|
作者
Boogaerts, Thomas [1 ]
Tranchevent, Leon-Charles [1 ]
Pavlopoulos, Georgios A. [1 ]
Aerts, Jan [1 ]
Vandewalle, Joos [1 ]
机构
[1] Katholieke Univ Leuven, ESAT SCD SISTA IBBT, KU Leuven Future Hlth Dept, B-3001 Louvain, Belgium
关键词
data visualization; parallel coordinates; genetic algorithm; gene prioritization;
D O I
暂无
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
In this paper, we introduce a visualization tool for interactive and efficient exploration of high dimensional data using parallel coordinates. An algorithm is developed to find an optimal permutation of dimensions, which allows the data miner to immediately see the most important features or irregularities in the dataset. This is implemented as a genetic algorithm based on the travelling salesman problem using maximal correlation as fitness. Other features of the tool include selection operators to group the data such as selection by intersection or by angle, orthogonal and density plots complementing the parallel coordinates plot, manual arrangement of permutation order of the dimensions, possibility to show all plots necessary to see all dimensional relations and displaying a certain number of standard deviations for each dimension separately. The tool is applied to multiple gene prioritization cases in search of genes that are relevant to certain genetic disorders. The used datasets are obtained with the MerKator and Endeavour tools and include a Breast cancer, Cataract, Charcoth-Marie-Tooth and Cardiomyopathy dataset, as well as a dataset relating 29 diseases with 22206 genes. Our tool, manual and data can be downloaded from http://www.toomas.be/parcoord/.
引用
收藏
页码:52 / 57
页数:6
相关论文
共 50 条
  • [21] A Adaptive Cooperative Coevolutionary Algorithm for Parallel Feature Selection in High-Dimensional Datasets
    Firouznia, Marjan
    Trunfio, Giuseppe A.
    30TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING (PDP 2022), 2022, : 211 - 218
  • [22] A Parallel Dynamic Weighted Ensemble Classifier for High Dimensional Cancer Datasets with Horizontal Partitioning
    Archana Suhas Vaidya
    Dipak Patil
    Rahul Chakre
    SN Computer Science, 6 (5)
  • [23] Visualizing High-dimensional Prediction Problems with Application to Automated Syndrome Classification
    Balliu, Brunilda
    Boehringer, Stefan
    GENETIC EPIDEMIOLOGY, 2012, 36 (07) : 750 - 751
  • [24] Block Coordinates Descent Parallel Optimization Algorithm for High-dimensional Big Data Analysis
    Meng, Xiang-jun
    Zhu, Li-peng
    Zhang, Wei-chang
    Hu, Bin
    Li, Xiao-yu
    INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION ENGINEERING (CSIE 2015), 2015, : 265 - 270
  • [25] Adaptive cooperative coevolutionary differential evolution for parallel feature selection in high-dimensional datasets
    Marjan Firouznia
    Pietro Ruiu
    Giuseppe A. Trunfio
    The Journal of Supercomputing, 2023, 79 : 15215 - 15244
  • [26] Adaptive cooperative coevolutionary differential evolution for parallel feature selection in high-dimensional datasets
    Firouznia, Marjan
    Ruiu, Pietro
    Trunfio, Giuseppe A.
    JOURNAL OF SUPERCOMPUTING, 2023, 79 (14): : 15215 - 15244
  • [27] Colliding galaxies, rotating neutron stars and merging black holes-visualizing high dimensional datasets on arbitrary meshes
    Benger, Werner
    NEW JOURNAL OF PHYSICS, 2008, 10
  • [28] A Parallel Coordinates Plot Method Based on Unsupervised Feature Selection for High-Dimensional Data Visualization
    Lou, Jiaqi
    Dong, Ke
    Wang, Maosen
    IWCMC 2021: 2021 17TH INTERNATIONAL WIRELESS COMMUNICATIONS & MOBILE COMPUTING CONFERENCE (IWCMC), 2021, : 532 - 536
  • [29] Effective Data Dimensionality Reduction Workflow for High-Dimensional Gene Expression Datasets
    Das, Utsha
    Srizon, Azmain Yakin
    Hasan, Md Al Mehedi
    Rahman, Julia
    Ben Islam, Md Khaled
    2020 IEEE REGION 10 SYMPOSIUM (TENSYMP) - TECHNOLOGY FOR IMPACTFUL SUSTAINABLE DEVELOPMENT, 2020, : 182 - 185
  • [30] Classify high dimensional datasets using discriminant positive negative association rules
    Thanh Do Van
    Hieu Do Duc
    Giap Cu Nguyen
    PROCEEDINGS OF THE 2018 5TH ASIAN CONFERENCE ON DEFENSE TECHNOLOGY (ACDT 2018), 2018, : 1 - 7