Robust and Scalable Column/Row Sampling from Corrupted Big Data

被引:3
|
作者
Rahmani, Mostafa [1 ]
Atia, George [1 ]
机构
[1] Univ Cent Florida, Orlando, FL 32816 USA
关键词
MATRIX; FACTORIZATION; ALGORITHMS;
D O I
10.1109/ICCVW.2017.215
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Conventional sampling techniques fall short of drawing descriptive sketches of the data when the data is grossly corrupted as such corruptions break the low rank structure required for them to perform satisfactorily. In this paper, we present new sampling algorithms which can locate the informative columns in presence of severe data corruptions. In addition, we develop new scalable randomized designs of the proposed algorithms. The proposed approach is simultaneously robust to sparse corruption and outliers and substantially outperforms the state-of-the-art robust sampling algorithms as demonstrated by experiments conducted using both real and synthetic data.
引用
收藏
页码:1818 / 1826
页数:9
相关论文
共 50 条
  • [21] Parallel sampling from big data with uncertainty distribution
    He, Qing
    Wang, Haocheng
    Zhuang, Fuzhen
    Shang, Tianfeng
    Shi, Zhongzhi
    FUZZY SETS AND SYSTEMS, 2015, 258 : 117 - 133
  • [22] Row–Column Sampling Design Using Auxiliary Ranking Variables
    Omer Ozturk
    Olena Kravchuk
    Raymond Correll
    Journal of Agricultural, Biological and Environmental Statistics, 2022, 27 : 652 - 673
  • [23] Sampling and Sampling Frames in Big Data Epidemiology
    Mooney, Stephen J.
    Garber, Michael D.
    CURRENT EPIDEMIOLOGY REPORTS, 2019, 6 (01) : 14 - 22
  • [24] Sampling and Sampling Frames in Big Data Epidemiology
    Stephen J. Mooney
    Michael D. Garber
    Current Epidemiology Reports, 2019, 6 : 14 - 22
  • [25] Sampling for Big Data: A Tutorial
    Cormode, Graham
    Duffield, Nick
    PROCEEDINGS OF THE 20TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'14), 2014, : 1975 - 1975
  • [26] Scalable Euclidean Embedding for Big Data
    Alavi, Zohreh
    Sharma, Sagar
    Zhou, Lu
    Chen, Keke
    2015 IEEE 8TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, 2015, : 773 - 780
  • [27] A Scalable Big Data Test Framework
    Li, Nan
    Escalona, Anthony
    Guo, Yun
    Offutt, Jeff
    2015 IEEE 8th International Conference on Software Testing, Verification and Validation (ICST), 2015,
  • [28] Sampling Operations on Big Data
    Gadepally, Vijay
    Herr, Taylor
    Johnson, Luke
    Milechin, Lauren
    Milosavljevic, Maja
    Miller, Benjamin A.
    2015 49TH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS, 2015, : 1515 - 1519
  • [29] Clouds for scalable Big Data processing
    Trunfio, Paolo
    Vlassov, Vladimir
    INTERNATIONAL JOURNAL OF PARALLEL EMERGENT AND DISTRIBUTED SYSTEMS, 2019, 34 (06) : 629 - 631
  • [30] Clouds for Scalable Big Data Analytics
    Talia, Domenico
    COMPUTER, 2013, 46 (05) : 98 - 101