Robust and Scalable Column/Row Sampling from Corrupted Big Data

被引:3
|
作者
Rahmani, Mostafa [1 ]
Atia, George [1 ]
机构
[1] Univ Cent Florida, Orlando, FL 32816 USA
关键词
MATRIX; FACTORIZATION; ALGORITHMS;
D O I
10.1109/ICCVW.2017.215
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Conventional sampling techniques fall short of drawing descriptive sketches of the data when the data is grossly corrupted as such corruptions break the low rank structure required for them to perform satisfactorily. In this paper, we present new sampling algorithms which can locate the informative columns in presence of severe data corruptions. In addition, we develop new scalable randomized designs of the proposed algorithms. The proposed approach is simultaneously robust to sparse corruption and outliers and substantially outperforms the state-of-the-art robust sampling algorithms as demonstrated by experiments conducted using both real and synthetic data.
引用
收藏
页码:1818 / 1826
页数:9
相关论文
共 50 条
  • [31] Scalable and Robust State Estimation From Abundant But Untrusted Data
    Jin, Ming
    Molybog, Igor
    Mohammadi-Ghazi, Reza
    Lavaei, Javad
    IEEE TRANSACTIONS ON SMART GRID, 2020, 11 (03) : 1880 - 1894
  • [32] Quantitative Abel tomography robust to noisy, corrupted and missing data
    Thomas J. Asaki
    Optimization and Engineering, 2010, 11 : 381 - 393
  • [33] Quantitative Abel tomography robust to noisy, corrupted and missing data
    Asaki, Thomas J.
    OPTIMIZATION AND ENGINEERING, 2010, 11 (03) : 381 - 393
  • [34] Scalable Transformation of Big Geospatial Data into Linked Data
    Mandilaras, George
    Koubarakis, Manolis
    SEMANTIC WEB - ISWC 2021, 2021, 12922 : 480 - 495
  • [36] Matrix row-column sampling for the many-light problem
    Hasan, Milos
    Pellacini, Fabio
    Bala, Kavita
    ACM TRANSACTIONS ON GRAPHICS, 2007, 26 (03):
  • [37] Row-Column Sampling Design Using Auxiliary Ranking Variables
    Ozturk, Omer
    Kravchuk, Olena
    Correll, Raymond
    JOURNAL OF AGRICULTURAL BIOLOGICAL AND ENVIRONMENTAL STATISTICS, 2022, 27 (04) : 652 - 673
  • [38] Low-rank double dictionary learning from corrupted data for robust image classification
    Rong, Yi
    Xiong, Shengwu
    Gao, Yongsheng
    PATTERN RECOGNITION, 2017, 72 : 419 - 432
  • [39] Building Data Warehouses in the Era of Big Data An Approach for Scalable and Flexible Big Data Warehouses
    Costa, Carlos
    Santos, Maribel Yasmina
    ADVANCED INFORMATION SYSTEMS ENGINEERING (CAISE 2019), 2019, 11483 : 693 - 695
  • [40] Query Execution for RDF Data on Row and Column Store
    Padiya, Trupti
    Bhise, Minal
    Vasani, Sandeep
    Pandey, Mohit
    DISTRIBUTED COMPUTING AND INTERNET TECHNOLOGY, ICDCIT 2015, 2015, 8956 : 403 - 408