Clustering and Classification to Evaluate Data Reduction via Johnson-Lindenstrauss Transform

被引:0
|
作者
Ghalib, Abdulaziz [1 ]
Jessup, Tyler D. [1 ]
Johnson, Julia [1 ]
Monemian, Seyedamin [1 ]
机构
[1] Laurentian Univ, Dept Math & Comp Sci, Sudbury, ON P3E 2C6, Canada
关键词
Data reduction; High dimensional data; Clustering; Classification; Johnson-Lindenstrauss Transform; DEFICIT HYPERACTIVITY DISORDER; VARIANCE;
D O I
10.1007/978-3-030-39442-4_16
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A dataset is a matrix X with n x d entries, where n is the number of observations and d is the number of variables (dimensions). Johnson and Lindenstrauss assert that a transformation exists to achieve a matrix with n x k entries, k << d, such that certain geometric properties of the original matrix are preserved. The property that we seek is that if we look at all pairs of points in matrix X, the distance between any two points should be the same within a given small acceptable level of distortion as the corresponding distance between the same two points in the reduced dataset. Does it follow that semantic content of the data is preserved in the transformation? We can answer in the affirmative that meaning in the original dataset was preserved in the reduced dataset. This was confirmed by comparison of clustering and classification results on the original and reduced datasets.
引用
收藏
页码:190 / 209
页数:20
相关论文
共 50 条
  • [1] Dimension reduction for data streams based on Johnson-Lindenstrauss transform
    Yang, Jing
    Zhao, Jia-Shi
    Zhang, Jian-Pei
    Jilin Daxue Xuebao (Gongxueban)/Journal of Jilin University (Engineering and Technology Edition), 2013, 43 (06): : 1626 - 1630
  • [2] A Sparse Johnson-Lindenstrauss Transform
    Dasgupta, Anirban
    Kumar, Ravi
    Sarlos, Tamas
    STOC 2010: PROCEEDINGS OF THE 2010 ACM SYMPOSIUM ON THEORY OF COMPUTING, 2010, : 341 - 350
  • [3] Private Query Release via the Johnson-Lindenstrauss Transform
    Nikolov, Aleksandar
    PROCEEDINGS OF THE 2023 ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, SODA, 2023, : 4982 - 5002
  • [4] Differential Private POI Queries via Johnson-Lindenstrauss Transform
    Yang, Mengmeng
    Zhu, Tianqing
    Liu, Bo
    Xiang, Yang
    Zhou, Wanlei
    IEEE ACCESS, 2018, 6 : 29685 - 29699
  • [5] Privacy Preserving Collaborative Filtering via the Johnson-Lindenstrauss Transform
    Yang, Mengmeng
    Zhu, Tianqing
    Ma, Lichuan
    Xiang, Yang
    Zhou, Wanlei
    2017 16TH IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS / 11TH IEEE INTERNATIONAL CONFERENCE ON BIG DATA SCIENCE AND ENGINEERING / 14TH IEEE INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS, 2017, : 417 - 424
  • [6] An Almost Optimal Unrestricted Fast Johnson-Lindenstrauss Transform
    Ailon, Nir
    Liberty, Edo
    ACM TRANSACTIONS ON ALGORITHMS, 2013, 9 (03)
  • [7] PERFORMANCE OF JOHNSON-LINDENSTRAUSS TRANSFORM FOR k-MEANS AND k-MEDIANS CLUSTERING
    Makarychev, Konstantin
    Makarychev, Yury
    Razenshteyn, Ilya
    SIAM JOURNAL ON COMPUTING, 2023, 52 (02)
  • [8] Performance of Johnson-Lindenstrauss Transform for k-Means and k-Medians Clustering
    Makarychev, Konstantin
    Makarychev, Yury
    Razenshteyn, Ilya
    PROCEEDINGS OF THE 51ST ANNUAL ACM SIGACT SYMPOSIUM ON THEORY OF COMPUTING (STOC '19), 2019, : 1027 - 1038
  • [9] Dimensionality reduction: beyond the Johnson-Lindenstrauss bound
    Bartal, Yair
    Recht, Ben
    Schulman, Leonard J.
    PROCEEDINGS OF THE TWENTY-SECOND ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2011, : 868 - 887
  • [10] The Johnson-Lindenstrauss Transform Itself Preserves Differential Privacy
    Blocki, Jeremiah
    Blum, Avrim
    Datta, Anupam
    Sheffet, Or
    2012 IEEE 53RD ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE (FOCS), 2012, : 410 - 419