Random indexing of multidimensional data

被引:6
|
作者
Sandin, Fredrik [1 ]
Emruli, Blerim [2 ]
Sahlgren, Magnus [3 ]
机构
[1] Lulea Univ Technol, EISLAB, S-97187 Lulea, Sweden
[2] SICS Swedish ICT, S-72213 Vasteras, Sweden
[3] SICS Swedish ICT, S-16429 Kista, Sweden
关键词
Data mining; Random embeddings; Dimensionality reduction; Sparse coding; Semantic similarity; Streaming algorithm; Natural language processing; JOHNSON-LINDENSTRAUSS;
D O I
10.1007/s10115-016-1012-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Random indexing (RI) is a lightweight dimension reduction method, which is used, for example, to approximate vector semantic relationships in online natural language processing systems. Here we generalise RI to multidimensional arrays and therefore enable approximation of higher-order statistical relationships in data. The generalised method is a sparse implementation of random projections, which is the theoretical basis also for ordinary RI and other randomisation approaches to dimensionality reduction and data representation. We present numerical experiments which demonstrate that a multidimensional generalisation of RI is feasible, including comparisons with ordinary RI and principal component analysis. The RI method is well suited for online processing of data streams because relationship weights can be updated incrementally in a fixed-size distributed representation, and inner products can be approximated on the fly at low computational cost. An open source implementation of generalised RI is provided.
引用
收藏
页码:267 / 290
页数:24
相关论文
共 50 条
  • [41] Variable screening for Lasso based on multidimensional indexing
    Zogala-Siudem, Barbara
    Jaroszewicz, Szymon
    DATA MINING AND KNOWLEDGE DISCOVERY, 2024, 38 (01) : 49 - 78
  • [42] Using Unbalanced Trees for Indexing Multidimensional Objects
    Charu Aggarwal
    Joel Wolf
    Philip Yu
    Marina Epelman
    Knowledge and Information Systems, 1999, 1 (3) : 309 - 336
  • [43] Using Unbalanced Trees for Indexing Multidimensional Objects
    Aggarwal, Charu
    Wolf, Joel
    Yu, Philip
    Epelman, Marina
    Knowledge and Information Systems, 1999, 1 (03): : 309 - 336
  • [44] Multidimensional indexing technique for medical images retrieval
    Safaei, Ali Asghar
    Habibi-Asl, Saeede
    INTELLIGENT DATA ANALYSIS, 2021, 25 (06) : 1629 - 1666
  • [45] Analyzing design choices for distributed multidimensional indexing
    Nam, Beomseok
    Sussman, Alan
    JOURNAL OF SUPERCOMPUTING, 2012, 59 (03): : 1552 - 1576
  • [46] A dynamic programming scheme with multidimensional step indexing
    Levit-Gurevich, L. K.
    Yaroshevskii, D. M.
    AUTOMATION AND REMOTE CONTROL, 2006, 67 (09) : 1373 - 1388
  • [47] On a backward problem for multidimensional Ginzburg-Landau equation with random data
    Kirane, Mokhtar
    Nane, Erkan
    Nguyen Huy Tuan
    INVERSE PROBLEMS, 2018, 34 (01)
  • [48] Reflective random indexing for semi-automatic indexing of the biomedical literature
    Vasuki, Vidya
    Cohen, Trevor
    JOURNAL OF BIOMEDICAL INFORMATICS, 2010, 43 (05) : 694 - 700
  • [49] A space-partitioning-based indexing method for multidimensional non-ordered discrete data spaces
    Qian, G
    Zhu, Q
    Xue, Q
    Pramanik, S
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2006, 24 (01) : 79 - 110
  • [50] Exploring Random Indexing for Profile Learning
    Fonseca Bruzon, Adrian
    Lopez-Lopez, Aurelio
    Medina Pagola, Jose
    FUTURE AND EMERGENT TRENDS IN LANGUAGE TECHNOLOGY, FETLT 2015, 2016, 9577 : 77 - 85