Random indexing of multidimensional data

被引:6
|
作者
Sandin, Fredrik [1 ]
Emruli, Blerim [2 ]
Sahlgren, Magnus [3 ]
机构
[1] Lulea Univ Technol, EISLAB, S-97187 Lulea, Sweden
[2] SICS Swedish ICT, S-72213 Vasteras, Sweden
[3] SICS Swedish ICT, S-16429 Kista, Sweden
关键词
Data mining; Random embeddings; Dimensionality reduction; Sparse coding; Semantic similarity; Streaming algorithm; Natural language processing; JOHNSON-LINDENSTRAUSS;
D O I
10.1007/s10115-016-1012-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Random indexing (RI) is a lightweight dimension reduction method, which is used, for example, to approximate vector semantic relationships in online natural language processing systems. Here we generalise RI to multidimensional arrays and therefore enable approximation of higher-order statistical relationships in data. The generalised method is a sparse implementation of random projections, which is the theoretical basis also for ordinary RI and other randomisation approaches to dimensionality reduction and data representation. We present numerical experiments which demonstrate that a multidimensional generalisation of RI is feasible, including comparisons with ordinary RI and principal component analysis. The RI method is well suited for online processing of data streams because relationship weights can be updated incrementally in a fixed-size distributed representation, and inner products can be approximated on the fly at low computational cost. An open source implementation of generalised RI is provided.
引用
收藏
页码:267 / 290
页数:24
相关论文
共 50 条
  • [31] Dynamic indexing for multidimensional non-ordered discrete data spaces using a data-partitioning approach
    Qian, Gang
    Zhu, Qiang
    Xue, Qiang
    Pramanik, Sakti
    ACM TRANSACTIONS ON DATABASE SYSTEMS, 2006, 31 (02): : 439 - 484
  • [32] Indexing schemes for random points
    Koutsoupias, E
    Taylor, D
    PROCEEDINGS OF THE TENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 1999, : 596 - 602
  • [33] Implementing Random Indexing on GPU
    Polok, Lukas
    Smrz, Pavel
    HIGH PERFORMANCE COMPUTING SYMPOSIUM 2011 (HPC 2011) - 2011 SPRING SIMULATION MULTICONFERENCE - BK 6 OF 8, 2011, 43 (02): : 134 - 142
  • [34] Human activity recognition using multidimensional indexing
    Ben-Arie, J
    Wang, ZQ
    Pandit, P
    Rajaram, S
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2002, 24 (08) : 1091 - 1104
  • [35] An evaluation framework for multidimensional multimedia Descriptor indexing
    Goncalves, Bruno
    Calistru, Catalin
    Ribeiro, Cristina
    David, Gabriel
    2007 IEEE 23RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOP, VOLS 1-2, 2007, : 95 - +
  • [36] Indexing of multidimensional lookup tables in embedded systems
    Vrhel, MJ
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2004, 13 (10) : 1319 - 1326
  • [37] Variable screening for Lasso based on multidimensional indexing
    Barbara Żogała-Siudem
    Szymon Jaroszewicz
    Data Mining and Knowledge Discovery, 2024, 38 : 49 - 78
  • [38] Perfect KDB-tree:: A compact KDB-tree structure for indexing multidimensional data
    Lin, HY
    Huang, PW
    THIRD INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND APPLICATIONS, VOL 2, PROCEEDINGS, 2005, : 411 - 414
  • [39] Analyzing design choices for distributed multidimensional indexing
    Beomseok Nam
    Alan Sussman
    The Journal of Supercomputing, 2012, 59 : 1552 - 1576
  • [40] A dynamic programming scheme with multidimensional step indexing
    L. K. Levit-Gurevich
    D. M. Yaroshevskii
    Automation and Remote Control, 2006, 67 : 1373 - 1388