Clustering and Classification to Evaluate Data Reduction via Johnson-Lindenstrauss Transform

被引:0
|
作者
Ghalib, Abdulaziz [1 ]
Jessup, Tyler D. [1 ]
Johnson, Julia [1 ]
Monemian, Seyedamin [1 ]
机构
[1] Laurentian Univ, Dept Math & Comp Sci, Sudbury, ON P3E 2C6, Canada
关键词
Data reduction; High dimensional data; Clustering; Classification; Johnson-Lindenstrauss Transform; DEFICIT HYPERACTIVITY DISORDER; VARIANCE;
D O I
10.1007/978-3-030-39442-4_16
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A dataset is a matrix X with n x d entries, where n is the number of observations and d is the number of variables (dimensions). Johnson and Lindenstrauss assert that a transformation exists to achieve a matrix with n x k entries, k << d, such that certain geometric properties of the original matrix are preserved. The property that we seek is that if we look at all pairs of points in matrix X, the distance between any two points should be the same within a given small acceptable level of distortion as the corresponding distance between the same two points in the reduced dataset. Does it follow that semantic content of the data is preserved in the transformation? We can answer in the affirmative that meaning in the original dataset was preserved in the reduced dataset. This was confirmed by comparison of clustering and classification results on the original and reduced datasets.
引用
收藏
页码:190 / 209
页数:20
相关论文
共 50 条
  • [41] Gene identification and protein classification in microbial metagenomic sequence data via incremental clustering
    Shibu Yooseph
    Weizhong Li
    Granger Sutton
    BMC Bioinformatics, 9
  • [42] Gene identification and protein classification in microbial metagenomic sequence data via incremental clustering
    Yooseph, Shibu
    Li, Weizhong
    Sutton, Granger
    BMC BIOINFORMATICS, 2008, 9 (1)
  • [43] Road Intersection Recognition via Combining Classification Model and Clustering Algorithm Based on GPS Data
    Liu, Yizhi
    Qing, Rutian
    Zhao, Yijiang
    Liao, Zhuhua
    ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2022, 11 (09)
  • [44] Support vector machine classification for large data sets via minimum enclosing ball clustering
    Cervantes, Jair
    Li, Xiaoou
    Yu, Wen
    Li, Kang
    NEUROCOMPUTING, 2008, 71 (4-6) : 611 - 619
  • [45] Clustering Gene Expression Data via Mining Ensembles Of Classification Rules Evolved Using MOSES
    Looks, Moshe
    Goertzel, Ben
    Coelho, Lucio de Souza
    Mudado, Mauricio
    Pennachin, Cassio
    GECCO 2007: GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, VOL 1 AND 2, 2007, : 407 - +
  • [46] Fast classification for large data sets via random selection clustering and Support Vector Machines
    Li, Xiaoou
    Cervantes, Jair
    Yu, Wen
    INTELLIGENT DATA ANALYSIS, 2012, 16 (06) : 897 - 914
  • [47] Data Augmentation based on Inverse Transform Sampling for Improved Tissue Classification via Electrical Impedance Spectroscopy
    McDermott, Conor
    Rossa, Carlos
    2023 IEEE SENSORS APPLICATIONS SYMPOSIUM, SAS, 2023,
  • [48] Sampling Training Data for Accurate Hyperspectral Image Classification via Tree-Based Spatial Clustering
    Appice, Annalisa
    Pravilovic, Sonja
    Malerba, Donato
    Lanza, Antonietta
    AI*IA 2017 ADVANCES IN ARTIFICIAL INTELLIGENCE, 2017, 10640 : 309 - 320
  • [49] Clustering of multivariate binary data with dimension reduction via L1-regularized likelihood maximization
    Yamamoto, Michio
    Hayashi, Kenichi
    PATTERN RECOGNITION, 2015, 48 (12) : 3959 - 3968
  • [50] Prototype generation in the string space via approximate median for data reduction in nearest neighbor classification
    Francisco J. Castellanos
    Jose J. Valero-Mas
    Jorge Calvo-Zaragoza
    Soft Computing, 2021, 25 : 15403 - 15415