Random indexing of multidimensional data

被引:6
|
作者
Sandin, Fredrik [1 ]
Emruli, Blerim [2 ]
Sahlgren, Magnus [3 ]
机构
[1] Lulea Univ Technol, EISLAB, S-97187 Lulea, Sweden
[2] SICS Swedish ICT, S-72213 Vasteras, Sweden
[3] SICS Swedish ICT, S-16429 Kista, Sweden
关键词
Data mining; Random embeddings; Dimensionality reduction; Sparse coding; Semantic similarity; Streaming algorithm; Natural language processing; JOHNSON-LINDENSTRAUSS;
D O I
10.1007/s10115-016-1012-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Random indexing (RI) is a lightweight dimension reduction method, which is used, for example, to approximate vector semantic relationships in online natural language processing systems. Here we generalise RI to multidimensional arrays and therefore enable approximation of higher-order statistical relationships in data. The generalised method is a sparse implementation of random projections, which is the theoretical basis also for ordinary RI and other randomisation approaches to dimensionality reduction and data representation. We present numerical experiments which demonstrate that a multidimensional generalisation of RI is feasible, including comparisons with ordinary RI and principal component analysis. The RI method is well suited for online processing of data streams because relationship weights can be updated incrementally in a fixed-size distributed representation, and inner products can be approximated on the fly at low computational cost. An open source implementation of generalised RI is provided.
引用
收藏
页码:267 / 290
页数:24
相关论文
共 50 条
  • [1] Random indexing of multidimensional data
    Fredrik Sandin
    Blerim Emruli
    Magnus Sahlgren
    Knowledge and Information Systems, 2017, 52 : 267 - 290
  • [2] Indexing Method for Multidimensional Vector Data
    Terry, Justin
    Stantic, Bela
    COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2013, 10 (03) : 1077 - 1104
  • [3] An Indexing Structure for Dynamic Multidimensional Data in Vector Space
    Mikhaylova, Elena
    Novikov, Boris
    Volokhov, Anton
    ADVANCES IN DATABASES AND INFORMATION SYSTEMS, 2013, 186 : 185 - 193
  • [4] Fault Tolerance Based Indexing for Multidimensional Data Bases
    Jain, Rachna
    Taygi, Praney
    Sharma, Mayank
    Khatri, Sunil Kumar
    PROCEEDINGS 2019 AMITY INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AICAI), 2019, : 129 - 133
  • [5] The OTree: Multidimensional Indexing with efficient data Sampling for HPC
    Cugnasco, Cesare
    Calmet, Hadrien
    Santamaria, Pol
    Sirvent, Raul
    Eguzkitza, Ane Beatriz
    Houzeaux, Guillaume
    Becerra, Yolanda
    Torres, Jordi
    Labarta, Jesus
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 433 - 440
  • [6] Variable Granularity Space Filling Curve for Indexing Multidimensional Data
    Terry, Justin
    Stantic, Bela
    Terenziani, Paolo
    Sattar, Abdul
    ADVANCES IN DATABASES AND INFORMATION SYSTEMS, 2011, 6909 : 111 - +
  • [7] Distributed Multidimensional Data Indexing Strategy in Cloud Computing Environment
    He Dinghua
    2021 6TH INTERNATIONAL CONFERENCE ON SMART GRID AND ELECTRICAL AUTOMATION (ICSGEA 2021), 2021, : 204 - 207
  • [8] A Data-Driven Multidimensional Indexing Method for Data Mining in Astrophysical Databases
    Marco Frailis
    Alessandro De Angelis
    Vito Roberto
    EURASIP Journal on Advances in Signal Processing, 2005
  • [9] A data-driven multidimensional indexing method for data mining in astrophysical databases
    Frailis, M
    De Angelis, A
    Roberto, V
    EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2005, 2005 (15) : 2514 - 2520
  • [10] An efficient peer-to-peer indexing tree structure for multidimensional data
    Zhang, Rong
    Qian, Weining
    Zhou, Aoying
    Zhou, Minqi
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2009, 25 (01): : 77 - 88