Efficient Dimensionality Reduction for Sparse Binary Data

被引:0
|
作者
Pratap, Rameshwar
Kulkarni, Raghav [1 ]
Sohony, Ishan [2 ]
机构
[1] CMI, Chennai, Tamil Nadu, India
[2] SUNY Stony Brook, Stony Brook, NY 11794 USA
来源
2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) | 2018年
关键词
Dimensionality Reduction; Sketching; Binary Data; Similarity Search; Locality Sensitive Hashing;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a dimensionality reduction (sketching) algorithm for high dimensional, sparse, binary data. Our proposed algorithm provides a single sketch which simultaneously preserves multiple similarity measures including Hamming distance, Inner product, and Jaccard Similarity [12]. In contrast to the "local projection" strategy used by most of the earlier algorithms [6], [4], [7], our approach exploits sparsity and combines the following two strategies: 1. partitioning the dimensions into several buckets, 2. obtaining " global linear summaries" within those buckets. Our algorithm is faster than the existing state-of-the-art, and it preserves the binary format of the data after the dimensionality reduction, which makes the sketch space efficient. Our algorithm can also be easily adapted in streaming and incremental learning frameworks. We give a rigorous theoretical analysis of the dimensionality reduction bounds and complement it with extensive experiments. Our proposed algorithm is simple and easy to implement in practice.
引用
收藏
页码:152 / 157
页数:6
相关论文
共 50 条
  • [1] Sparse Unsupervised Dimensionality Reduction for Multiple View Data
    Han, Yahong
    Wu, Fei
    Tao, Dacheng
    Shao, Jian
    Zhuang, Yueting
    Jiang, Jianmin
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2012, 22 (10) : 1485 - 1496
  • [2] Dimensionality reduction based on binary encoding for hyperspectral data
    Jijon Palma, Mario Ernesto
    Lima Machado, Alvaro Muriel
    Silva Centeno, Jorge Antonio
    INTERNATIONAL JOURNAL OF REMOTE SENSING, 2019, 40 (09) : 3401 - 3420
  • [3] Efficient Sketching Algorithm for Sparse Binary Data
    Pratap, Rameshwar
    Bera, Debajyoti
    Revanuru, Karthik
    2019 19TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2019), 2019, : 508 - 517
  • [4] Sparse Dimensionality Reduction Revisited
    Høgsgaard, Mikael Møller
    Kamma, Lior
    Larsen, Kasper Green
    Nelson, Jelani
    Schwiegelshohn, Chris
    arXiv, 2023,
  • [5] Dimensionality reduction for binary data through the projection of natural parameters
    Landgraf, Andrew J.
    Lee, Yoonkyung
    JOURNAL OF MULTIVARIATE ANALYSIS, 2020, 180
  • [6] Clustering and Dimensionality Reduction to Discover Interesting Patterns in Binary Data
    Palumbo, Francesco
    D'Enza, Alfonso Iodice
    ADVANCES IN DATA ANALYSIS, DATA HANDLING AND BUSINESS INTELLIGENCE, 2010, : 45 - +
  • [7] Sparse Kernel Entropy Component Analysis for Dimensionality Reduction of Neuroimaging Data
    Jiang, Qikun
    Shi, Jun
    2014 36TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2014, : 3366 - 3369
  • [8] Sparse kernel entropy component analysis for dimensionality reduction of biomedical data
    Shi, Jun
    Jiang, Qikun
    Zhang, Qi
    Huang, Qinghua
    Li, Xuelong
    NEUROCOMPUTING, 2015, 168 : 930 - 940
  • [9] Dimensionality reduction for regularization of sparse data-driven RANS simulations
    Piroozmand, Pasha
    Brenner, Oliver
    Jenny, Patrick
    JOURNAL OF COMPUTATIONAL PHYSICS, 2023, 492
  • [10] Category Guided Sparse Preserving Projection for Biometric Data Dimensionality Reduction
    Huang, Qianying
    Wu, Yunsong
    Zhao, Chenqiu
    Zhang, Xiaohong
    Yang, Dan
    Biometric Recognition, 2016, 9967 : 539 - 546