Multidimensional scaling of noisy high dimensional data

被引:9
|
作者
Peterfreund, Erez [1 ,2 ]
Gavish, Matan [1 ,2 ]
机构
[1] Hebrew Univ Jerusalem, Jerusalem, Israel
[2] Hebrew Univ Jerusalem, Sch Comp Sci & Engn, Jerusalem, Israel
基金
以色列科学基金会;
关键词
Multidimensional scaling; Euclidean embedding; Dimensionality reduction; Singular value thresholding; Optimal shrinkage; MDS; LARGEST EIGENVALUE; SINGULAR-VALUES; MATRIX;
D O I
10.1016/j.acha.2020.11.006
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Multidimensional Scaling (MDS) is a classical technique for embedding data in low dimensions, still in widespread use today. In this paper we study MDS in a modern setting specifically, high dimensions and ambient measurement noise. We show that as the ambient noise level increases, MDS suffers a sharp breakdown that depends on the data dimension and noise level, and derive an explicit formula for this breakdown point in the case of white noise. We then introduce MDS+, a simple variant of MDS, which applies a shrinkage nonlinearity to the eigenvalues of the MDS similarity matrix. Under a natural loss function measuring the embedding quality, we prove that MDS+ is the unique, asymptotically optimal shrinkage function. MDS+ offers improved embedding, sometimes significantly so, compared with MDS. Importantly, MDS+ calculates the optimal embedding dimension, into which the data should be embedded. (c) 2020 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
引用
收藏
页码:333 / 373
页数:41
相关论文
共 50 条
  • [1] Visualization of high-dimensional data using an association of multidimensional scaling to clustering
    Naud, A
    2004 IEEE CONFERENCE ON CYBERNETICS AND INTELLIGENT SYSTEMS, VOLS 1 AND 2, 2004, : 252 - 255
  • [2] Multidimensional scaling with discrimination coefficients for supervised visualization of high-dimensional data
    Berrar, Daniel
    Ohmayer, Georg
    NEURAL COMPUTING & APPLICATIONS, 2011, 20 (08): : 1211 - 1218
  • [3] Multidimensional scaling with discrimination coefficients for supervised visualization of high-dimensional data
    Daniel Berrar
    Georg Ohmayer
    Neural Computing and Applications, 2011, 20 : 1211 - 1218
  • [4] Focused multidimensional scaling: interactive visualization for exploration of high-dimensional data
    Urpa, Lea M.
    Anders, Simon
    BMC BIOINFORMATICS, 2019, 20 (1)
  • [5] Focused multidimensional scaling: interactive visualization for exploration of high-dimensional data
    Lea M. Urpa
    Simon Anders
    BMC Bioinformatics, 20
  • [6] Symbolic Multidimensional Scaling Versus Noisy Variables and Outliers
    Pelka, Marcin
    CLASSIFICATION AS A TOOL FOR RESEARCH, 2010, : 341 - 349
  • [7] Improving the efficiency of multidimensional scaling in the analysis of high-dimensional data using singular value decomposition
    Becavin, Christophe
    Tchitchek, Nicolas
    Mintsa-Eya, Colette
    Lesne, Annick
    Benecke, Arndt
    BIOINFORMATICS, 2011, 27 (10) : 1413 - 1421
  • [8] Multidimensional scaling for big data
    Delicado, Pedro
    Pachon-Garcia, Cristian
    ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2024,
  • [9] Data visualization with multidimensional scaling
    Buja, Andreas
    Swayne, Deborah F.
    Littman, Michael L.
    Dean, Nathaniel
    Hofmann, Heike
    Chen, Lisha
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2008, 17 (02) : 444 - 472
  • [10] Geometric classifiers for high-dimensional noisy data
    Ishii, Aki
    Yata, Kazuyoshi
    Aoshima, Makoto
    JOURNAL OF MULTIVARIATE ANALYSIS, 2022, 188