Clustering and Classification to Evaluate Data Reduction via Johnson-Lindenstrauss Transform

被引:0
|
作者
Ghalib, Abdulaziz [1 ]
Jessup, Tyler D. [1 ]
Johnson, Julia [1 ]
Monemian, Seyedamin [1 ]
机构
[1] Laurentian Univ, Dept Math & Comp Sci, Sudbury, ON P3E 2C6, Canada
关键词
Data reduction; High dimensional data; Clustering; Classification; Johnson-Lindenstrauss Transform; DEFICIT HYPERACTIVITY DISORDER; VARIANCE;
D O I
10.1007/978-3-030-39442-4_16
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A dataset is a matrix X with n x d entries, where n is the number of observations and d is the number of variables (dimensions). Johnson and Lindenstrauss assert that a transformation exists to achieve a matrix with n x k entries, k << d, such that certain geometric properties of the original matrix are preserved. The property that we seek is that if we look at all pairs of points in matrix X, the distance between any two points should be the same within a given small acceptable level of distortion as the corresponding distance between the same two points in the reduced dataset. Does it follow that semantic content of the data is preserved in the transformation? We can answer in the affirmative that meaning in the original dataset was preserved in the reduced dataset. This was confirmed by comparison of clustering and classification results on the original and reduced datasets.
引用
收藏
页码:190 / 209
页数:20
相关论文
共 50 条
  • [31] A data structure and function classification based method to evaluate clustering models for gene expression data
    易东
    杨梦苏
    黄明辉
    李辉智
    王文昌
    Journal of Medical Colleges of PLA, 2002, (04) : 312 - 317
  • [32] Data reduction via clustering and averaging for contingency and reliability analysis
    Kile, Hakon
    Uhlen, Kjetil
    INTERNATIONAL JOURNAL OF ELECTRICAL POWER & ENERGY SYSTEMS, 2012, 43 (01) : 1435 - 1442
  • [33] Clustering, classification, discriminant analysis, and dimension reduction via generalized hyperbolic mixtures
    Morris, Katherine
    McNicholas, Paul D.
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2016, 97 : 133 - 150
  • [34] A Classification Algorithm Based on Data Clustering and Data Reduction for Intrusion Detection System over Big Data
    Wang, Qiuhua
    Ouyang, Xiaoqin
    Zhan, Jiacheng
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2019, 13 (07): : 3714 - 3732
  • [35] Automatic classification of nuclear physics data via a Constrained Evolutionary Clustering approach
    Dell'Aquila, D.
    Russo, M.
    COMPUTER PHYSICS COMMUNICATIONS, 2021, 259
  • [36] Clustering and semi-supervised classification for clickstream data via mixture models
    Gallaugher, Michael P. B.
    Mcnicholas, Paul D.
    CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2024, 52 (03): : 678 - 695
  • [37] Multi-label classification via incremental clustering on an evolving data stream
    Tien Thanh Nguyen
    Manh Truong Dang
    Anh Vu Luong
    Liew, Alan Wee-Chung
    Liang, Tiancai
    McCall, John
    PATTERN RECOGNITION, 2019, 95 : 96 - 113
  • [38] Data reduction via the wavelet transform for the synthesis of the projection-slice filter
    Riasati, VR
    Zhou, HY
    Chao, TH
    Gregory, DA
    OPTICAL ENGINEERING, 2000, 39 (05) : 1218 - 1222
  • [39] Health Assessment of Liquid Cooling System in Aircrafts: Data Visualization, Reduction, Clustering, and Classification
    Najjar, Nayeff
    Sankavaram, Chaitanya
    Hare, James
    Gupta, Shalabh
    Pattipati, Krishna
    Walthall, Rhonda
    D'Orlando, Paul
    SAE INTERNATIONAL JOURNAL OF AEROSPACE, 2012, 5 (01): : 119 - 127
  • [40] Power System Event Classification via Dimensionality Reduction of Synchrophasor Data
    Chen, Yang
    Xie, Le
    Kumar, P. R.
    2014 IEEE 8TH SENSOR ARRAY AND MULTICHANNEL SIGNAL PROCESSING WORKSHOP (SAM), 2014, : 57 - 60