Clustering and Classification to Evaluate Data Reduction via Johnson-Lindenstrauss Transform

被引:0
|
作者
Ghalib, Abdulaziz [1 ]
Jessup, Tyler D. [1 ]
Johnson, Julia [1 ]
Monemian, Seyedamin [1 ]
机构
[1] Laurentian Univ, Dept Math & Comp Sci, Sudbury, ON P3E 2C6, Canada
关键词
Data reduction; High dimensional data; Clustering; Classification; Johnson-Lindenstrauss Transform; DEFICIT HYPERACTIVITY DISORDER; VARIANCE;
D O I
10.1007/978-3-030-39442-4_16
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A dataset is a matrix X with n x d entries, where n is the number of observations and d is the number of variables (dimensions). Johnson and Lindenstrauss assert that a transformation exists to achieve a matrix with n x k entries, k << d, such that certain geometric properties of the original matrix are preserved. The property that we seek is that if we look at all pairs of points in matrix X, the distance between any two points should be the same within a given small acceptable level of distortion as the corresponding distance between the same two points in the reduced dataset. Does it follow that semantic content of the data is preserved in the transformation? We can answer in the affirmative that meaning in the original dataset was preserved in the reduced dataset. This was confirmed by comparison of clustering and classification results on the original and reduced datasets.
引用
收藏
页码:190 / 209
页数:20
相关论文
共 50 条
  • [21] NEW AND IMPROVED JOHNSON-LINDENSTRAUSS EMBEDDINGS VIA THE RESTRICTED ISOMETRY PROPERTY
    Krahmer, Felix
    Ward, Rachel
    SIAM JOURNAL ON MATHEMATICAL ANALYSIS, 2011, 43 (03) : 1269 - 1281
  • [22] TOWARDS AUTOMATED IMAGE HASHING BASED ON THE FAST JOHNSON-LINDENSTRAUSS TRANSFORM (FJLT)
    Fatourechi, Mehrdad
    Lv, Xudong
    Wang, Z. Jane
    Ward, Rabab K.
    2009 FIRST IEEE INTERNATIONAL WORKSHOP ON INFORMATION FORENSICS AND SECURITY (WIFS), 2009, : 121 - 125
  • [23] An Efficient Multi-keyword Ranked Retrieval Scheme with Johnson-Lindenstrauss Transform Over Encrypted Cloud Data
    Li, Ke
    Zhang, Weiming
    Tian, Ke
    Liu, Rundong
    Yu, Nenghai
    2013 INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA (CLOUDCOM-ASIA), 2013, : 320 - 327
  • [24] Oblivious Dimension Reduction for k-Means: Beyond Subspaces and the Johnson-Lindenstrauss Lemma
    Becchetti, Luca
    Bury, Marc
    Cohen-Addad, Vincent
    Grandoni, Fabrizio
    Schwiegelshohn, Chris
    PROCEEDINGS OF THE 51ST ANNUAL ACM SIGACT SYMPOSIUM ON THEORY OF COMPUTING (STOC '19), 2019, : 1039 - 1050
  • [25] An efficient superpostional quantum Johnson-Lindenstrauss lemma via unitary t-designs
    Sen, Pranab
    QUANTUM INFORMATION PROCESSING, 2021, 20 (09)
  • [26] Dimensionality reduction via the Johnson–Lindenstrauss Lemma: theoretical and empirical bounds on embedding dimension
    John Fedoruk
    Byron Schmuland
    Julia Johnson
    Giseon Heo
    The Journal of Supercomputing, 2018, 74 : 3933 - 3949
  • [27] Gravitational transform for data clustering - Application to multicomponent image classification
    Cariou, C
    Chehdi, K
    Nagle, A
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 105 - 108
  • [28] Automation of the Data Processing Via Clustering on the Wavelet Transform Base
    Shcherbakova, Galina
    Krylov, Viktor
    Pisarenko, Radmila
    Logvinov, Oleg
    2016 13TH INTERNATIONAL CONFERENCE ON MODERN PROBLEMS OF RADIO ENGINEERING, TELECOMMUNICATIONS AND COMPUTER SCIENCE (TCSET), 2016, : 685 - 687
  • [29] Feature reduction of unbalanced data classification based on density clustering
    Wang, Zhen-Fei
    Yuan, Pei-Yao
    Cao, Zhong-Ya
    Zhang, Li-Ying
    COMPUTING, 2024, 106 (01) : 29 - 55
  • [30] Feature reduction of unbalanced data classification based on density clustering
    Zhen-Fei Wang
    Pei-Yao Yuan
    Zhong-Ya Cao
    Li-Ying Zhang
    Computing, 2024, 106 : 29 - 55