What is the Intrinsic Dimension of Your Binary Data?-and How to Compute it Quickly

被引:0
|
作者
Hanika, Tom [1 ]
Hille, Tobias [2 ,3 ]
机构
[1] Univ Hildesheim, Intelligent Informat Syst, Hildesheim, Germany
[2] Univ Kassel, Knowledge & Data Engn Grp, Kassel, Germany
[3] Univ Kassel, Interdisciplinary Res Ctr Informat Syst Design, Kassel, Germany
来源
CONCEPTUAL KNOWLEDGE STRUCTURES, CONCEPTS 2024 | 2024年 / 14914卷
关键词
intrinsic dimension; high-dimensional data; binary data; extrinsic dimension;
D O I
10.1007/978-3-031-67868-4_7
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Dimensionality is an important aspect for analyzing and understanding (high-dimensional) data. In their 2006 ICDM paper Tatti et al. answered the question for a (interpretable) dimension of binary data tables by introducing a normalized correlation dimension. In the present work we revisit their results and contrast them with a concept based notion of intrinsic dimension (ID) recently introduced for geometric data sets. To do this, we present a novel approximation for this ID that is based on computing concepts only up to a certain support value. We demonstrate and evaluate our approximation using all available datasets from Tatti et al., which have between 469 and 41271 extrinsic dimensions. (Source code and more figures are available at https://codeberg.org/thille/bd-gid).
引用
收藏
页码:97 / 112
页数:16
相关论文
共 50 条
  • [31] How Ordinal Are Your Data?
    Jayawardena, Sadari
    Epps, Julien
    Huang, Zhaocheng
    INTERSPEECH 2020, 2020, : 1853 - 1857
  • [32] How complete are your data?
    McDowall, R.D., 1600, Advanstar Communications Inc. (28):
  • [33] How Complete Are Your Data?
    McDowall, R. D.
    SPECTROSCOPY, 2013, 28 (04) : 18 - 25
  • [34] How Divergent Is Your Data?
    Pastor, Eliana
    Gavgavian, Andrew
    Baralis, Elena
    de Alfaro, Luca
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2021, 14 (12): : 2835 - 2838
  • [35] How secure is your data?
    Greengard, S
    WORKFORCE, 1998, 77 (05): : 52 - +
  • [36] The Future of Compute: How the Data Transformation is Reshaping VLSI
    Mayberry, Mike
    2020 IEEE SYMPOSIUM ON VLSI TECHNOLOGY, 2020,
  • [38] How Fit Are Your Data?
    Bedard, L. Paul
    Barnes, Sarah-Jane
    GEOSTANDARDS AND GEOANALYTICAL RESEARCH, 2010, 34 (03) : 275 - 280
  • [39] HOW SAFE ARE YOUR DATA?
    Perkel, Jeffrey
    NATURE, 2010, 464 (7293) : 1260 - 1261
  • [40] HOW TO COMPUTE WITH DATA YOU CAN'T SEE
    Popa, Raluca Ada
    Zeldovich, Nickolai
    IEEE SPECTRUM, 2015, 52 (08) : 42 - 47