Data segmentation based on the local intrinsic dimension

被引:0
|
作者
Michele Allegra
Elena Facco
Francesco Denti
Alessandro Laio
Antonietta Mira
机构
[1] Aix Marseille Université,Institut de Neurosciences de la Timone UMR 7289
[2] CNRS,undefined
[3] Scuola Internazionale Superiore di Studi Avanzati,undefined
[4] University of California,undefined
[5] International Centre for Theoretical Physics,undefined
[6] Università della Svizzera italiana,undefined
[7] Università dell’Insubria,undefined
来源
关键词
D O I
暂无
中图分类号
学科分类号
摘要
One of the founding paradigms of machine learning is that a small number of variables is often sufficient to describe high-dimensional data. The minimum number of variables required is called the intrinsic dimension (ID) of the data. Contrary to common intuition, there are cases where the ID varies within the same data set. This fact has been highlighted in technical discussions, but seldom exploited to analyze large data sets and obtain insight into their structure. Here we develop a robust approach to discriminate regions with different local IDs and segment the points accordingly. Our approach is computationally efficient and can be proficiently used even on large data sets. We find that many real-world data sets contain regions with widely heterogeneous dimensions. These regions host points differing in core properties: folded versus unfolded configurations in a protein molecular dynamics trajectory, active versus non-active regions in brain imaging data, and firms with different financial risk in company balance sheets. A simple topological feature, the local ID, is thus sufficient to achieve an unsupervised segmentation of high-dimensional data, complementary to the one given by clustering algorithms.
引用
收藏
相关论文
共 50 条
  • [1] Data segmentation based on the local intrinsic dimension
    Allegra, Michele
    Facco, Elena
    Denti, Francesco
    Laio, Alessandro
    Mira, Antonietta
    SCIENTIFIC REPORTS, 2020, 10 (01)
  • [2] FRACTAL DIMENSION AND LOCAL INTRINSIC DIMENSION
    PASSAMANTE, A
    HEDIGER, T
    GOLLUB, M
    PHYSICAL REVIEW A, 1989, 39 (07): : 3640 - 3645
  • [3] Intrinsic dimension estimation based on local adjacency information
    Qiu, Haiquan
    Yang, Youlong
    Li, Benchong
    INFORMATION SCIENCES, 2021, 558 : 21 - 33
  • [4] Combining intrinsic dimension and local tangent space for manifold spectral clustering image segmentation
    Xiaoling Yao
    Rongguo Zhang
    Jing Hu
    Kai Chang
    Xiaojun Liu
    Jian Zhao
    Soft Computing, 2022, 26 : 9557 - 9572
  • [5] Combining intrinsic dimension and local tangent space for manifold spectral clustering image segmentation
    Yao, Xiaoling
    Zhang, Rongguo
    Hu, Jing
    Chang, Kai
    Liu, Xiaojun
    Zhao, Jian
    SOFT COMPUTING, 2022, 26 (18) : 9557 - 9572
  • [6] Remote sensing image segmentation based on local fractal dimension
    Department of Optical Engineering, Beijing Institute of Technology, Beijing 100081, China
    Guangdian Gongcheng, 2008, 1 (136-139):
  • [7] Estimating the intrinsic dimension of data with a fractal-based method
    Camastra, F
    Vinciarelli, A
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2002, 24 (10) : 1404 - 1407
  • [8] On Local Intrinsic Dimension Estimation and Its Applications
    Carter, Kevin M.
    Raich, Raviv
    Hero, Alfred O., III
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2010, 58 (02) : 650 - 663
  • [9] INTRINSIC DIMENSION OF GEOMETRIC DATA SETS
    Hanika, Tom
    Schneider, Friedrich Martin
    Stumme, Gerd
    TOHOKU MATHEMATICAL JOURNAL, 2022, 74 (01) : 23 - 52
  • [10] The segmentation algorithm of man-made regions based on local fractal dimension
    Guo, Dongwei
    Cao, Lisai
    Zou, Yun
    FOURTH INTERNATIONAL CONFERENCE ON DIGITAL IMAGE PROCESSING (ICDIP 2012), 2012, 8334