Random indexing of multidimensional data

被引:6
|
作者
Sandin, Fredrik [1 ]
Emruli, Blerim [2 ]
Sahlgren, Magnus [3 ]
机构
[1] Lulea Univ Technol, EISLAB, S-97187 Lulea, Sweden
[2] SICS Swedish ICT, S-72213 Vasteras, Sweden
[3] SICS Swedish ICT, S-16429 Kista, Sweden
关键词
Data mining; Random embeddings; Dimensionality reduction; Sparse coding; Semantic similarity; Streaming algorithm; Natural language processing; JOHNSON-LINDENSTRAUSS;
D O I
10.1007/s10115-016-1012-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Random indexing (RI) is a lightweight dimension reduction method, which is used, for example, to approximate vector semantic relationships in online natural language processing systems. Here we generalise RI to multidimensional arrays and therefore enable approximation of higher-order statistical relationships in data. The generalised method is a sparse implementation of random projections, which is the theoretical basis also for ordinary RI and other randomisation approaches to dimensionality reduction and data representation. We present numerical experiments which demonstrate that a multidimensional generalisation of RI is feasible, including comparisons with ordinary RI and principal component analysis. The RI method is well suited for online processing of data streams because relationship weights can be updated incrementally in a fixed-size distributed representation, and inner products can be approximated on the fly at low computational cost. An open source implementation of generalised RI is provided.
引用
收藏
页码:267 / 290
页数:24
相关论文
共 50 条
  • [21] Spatial indexing of distributed multidimensional datasets
    Nam, B
    Sussman, A
    2005 IEEE International Symposium on Cluster Computing and the Grid, Vols 1 and 2, 2005, : 743 - 750
  • [22] Multidimensional indexing tools for the virtual observatory
    Csabai, I.
    Dobos, L.
    Trencseni, M.
    Herczegh, G.
    Jozsa, P.
    Purged, N.
    Budavari, T.
    Szalay, A. S.
    ASTRONOMISCHE NACHRICHTEN, 2007, 328 (08) : 852 - 857
  • [23] Indexing multidimensional time-series
    Vlachos, M
    Hadjieleftheriou, M
    Gunopulos, D
    Keogh, E
    VLDB JOURNAL, 2006, 15 (01): : 1 - 20
  • [24] Multidimensional descriptor indexing: Exploring the BitMatrix
    Calistru, Catalin
    Ribeiro, Cristina
    David, Gabriel
    IMAGE AND VIDEO RETRIEVAL, PROCEEDINGS, 2006, 4071 : 401 - 410
  • [25] Applying Random Indexing to Structured Data to Find Contextually Similar Words
    Damljanovic, Danica
    Kruschwitz, Udo
    Albakour, M-Dyaa
    Petrak, Johann
    Lupu, Mihai
    LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 2023 - 2030
  • [26] Multidimensional indexing in an OODBMS - A case study
    Sallam, Ibrahim
    2006 Canadian Conference on Electrical and Computer Engineering, Vols 1-5, 2006, : 2094 - 2098
  • [27] M-Grid: a distributed framework for multidimensional indexing and querying of location based data
    Shashank Kumar
    Sanjay Madria
    Mark Linderman
    Distributed and Parallel Databases, 2017, 35 : 55 - 81
  • [28] KDBKD-Tree:: A compact KDB-Tree structure for indexing multidimensional data
    Yu, BG
    Orlandic, R
    Bailey, T
    Somavaram, J
    ITCC 2003: INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: COMPUTERS AND COMMUNICATIONS, PROCEEDINGS, 2003, : 676 - 680
  • [29] M-Grid: a distributed framework for multidimensional indexing and querying of location based data
    Kumar, Shashank
    Madria, Sanjay
    Linderman, Mark
    DISTRIBUTED AND PARALLEL DATABASES, 2017, 35 (01) : 55 - 81
  • [30] A UAV flight data anomaly detection method for multidimensional data and random noise
    Li, Shaobo
    Wang, Yan
    Yang, Lei
    Zhang, Ansi
    Li, Chuanjiang
    Zhongguo Guanxing Jishu Xuebao/Journal of Chinese Inertial Technology, 2024, 32 (07): : 733 - 742