High-dimensional labeled data analysis with topology representing graphs

被引:20
|
作者
Aupetit, M [1 ]
Catz, T [1 ]
机构
[1] CEA, DAM, Dept Anal Surveillance Environm, F-91680 Bruyeres Le Chatel, France
关键词
exploratory data analysis; high-dimensional labeled data; topology representing graph; Delaunay graph; Gabriel graph; Voronoi cell; classification; decision boundary;
D O I
10.1016/j.neucom.2004.04.009
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose the use of topology representing graphs for the exploratory analysis of high-dimensional labeled data. The Delaunay graph contains all the topological information needed to analyze the topology of the classes (e.g. the number of separate clusters of a given class, the way these clusters are in contact with each other or the shape of these clusters). The Delaunay graph also allows to sample the decision boundary of the Nearest Neighbor rule, to define a topological criterion of non-linear separability of the classes and to find data which are near the decision boundary so that their label must be considered carefully. This graph then provides a way to analyze the complexity of a classification problem, and tools for decision support. When the Delaunay graph is not tractable in too high-dimensional spaces, we propose to use the Gabriel graph instead and discuss the limits of this approach. This analysis technique is complementary with projection techniques, as it allows to handle the data as they are in the data space, avoiding projection distortions. We apply it to analyze the well-known Iris database and a seismic events database. (C) 2004 Elsevier B.V. All rights reserved.
引用
收藏
页码:139 / 169
页数:31
相关论文
共 50 条
  • [1] Discover the semantic topology in high-dimensional data
    Chiang, I-Jen
    EXPERT SYSTEMS WITH APPLICATIONS, 2007, 33 (01) : 256 - 262
  • [2] Measuring the quality of projections of high-dimensional labeled data
    Benato, Barbara C.
    Falcao, Alexandre X.
    Telea, Alexandru C.
    COMPUTERS & GRAPHICS-UK, 2023, 116 : 287 - 297
  • [3] Survival Analysis of High-Dimensional Data With Graph Convolutional Networks and Geometric Graphs
    Ling, Yurong
    Liu, Zijing
    Xue, Jing-Hao
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (04) : 4876 - 4886
  • [4] Representing the dynamics of high-dimensional data with non-redundant wavelets
    Jia, Shanshan
    Li, Xingyi
    Huang, Tiejun
    Liu, Jian K.
    Yu, Zhaofei
    PATTERNS, 2022, 3 (03):
  • [5] An Outlier-Resilient Autoencoder for Representing High-Dimensional and Incomplete Data
    Wu, Di
    Hu, Yuanpeng
    Liu, Kechen
    Li, Jing
    Wang, Xianmin
    Deng, Song
    Zheng, Nenggan
    Luo, Xin
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024,
  • [6] Significance analysis of high-dimensional, low-sample size partially labeled data
    Lu, Qiyi
    Qiao, Xingye
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2016, 176 : 78 - 94
  • [7] High-dimensional data analysis and visualisation
    Chen, Cathy W. S.
    Lombardo, Rosaria
    Ripamonti, Enrico
    COMPUTATIONAL STATISTICS, 2024, 39 (01) : 1 - 2
  • [8] Procrustes Analysis for High-Dimensional Data
    Andreella, Angela
    Finos, Livio
    PSYCHOMETRIKA, 2022, 87 (04) : 1422 - 1438
  • [9] Procrustes Analysis for High-Dimensional Data
    Angela Andreella
    Livio Finos
    Psychometrika, 2022, 87 : 1422 - 1438
  • [10] High-dimensional data analysis and visualisation
    Cathy W. S. Chen
    Rosaria Lombardo
    Enrico Ripamonti
    Computational Statistics, 2024, 39 : 1 - 2