A distance metric-based space-filling subsampling method for nonparametric models

被引:0
|
作者
Diao, Huaimi [1 ]
Wang, Dianpeng [1 ]
He, Xu [2 ]
机构
[1] Beijing Inst Technol, Sch Math & Stat, Beijing, Peoples R China
[2] Chinese Acad Sci, Acad Math & Syst Sci, MADIS, Beijing, Peoples R China
来源
ELECTRONIC JOURNAL OF STATISTICS | 2024年 / 18卷 / 02期
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Big data; nonparametric model; space-filling design; tall data;
D O I
10.1214/24-EJS2251
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Taking subset samples from the original data set is an efficient and popular strategy to handle massive data that is too large to be directly modeled. To optimize inference and prediction accuracy, it is crucial to employ a subsampling scheme to collect observations intelligently. In this paper, we propose a space-filling subsampling method that uses distance metric-based strata to select subsamples from high-volume data sets. To minimize the maximal distance from pairs of samples that locate in the same stratum, Voronoi cells of thinnest covering lattices are used to partition the input space. In addition, subsamples that are space-filling according to the response are collected from each stratum. With the help of an algorithm to quickly identify the cell an observation locates in, the computational cost of our subsampling method is proportional to the number of observations and irrelevant to the number of cells, which makes our method applicable to extremely large data sets. Results from simulated studies and real data analysis show that the new method is remarkably better than existing approaches when used in conjunction with Gaussian process models.
引用
收藏
页码:3247 / 3273
页数:27
相关论文
共 50 条
  • [31] Tartu plastic space-filling atomic-molecular models
    Mikelsaar, R
    INTERNATIONAL JOURNAL OF MATERIALS & PRODUCT TECHNOLOGY, 1995, 10 (3-6): : 545 - 547
  • [32] SPACE-FILLING MODELS FOR COAL - A MOLECULAR DESCRIPTION OF COAL PLASTICITY
    SPIRO, CL
    FUEL, 1981, 60 (12) : 1121 - 1126
  • [33] Space-filling polyhedra as mechanical models for solidified dry foams
    Daxner, Thomas
    Bitsche, Robert D.
    Boehm, Helmut J.
    MATERIALS TRANSACTIONS, 2006, 47 (09) : 2213 - 2218
  • [34] CONSTRUCTION OF SPACE-FILLING MODELS OF PROTEINS USING DIHEDRAL ANGLES
    YANKEELOW, JA
    COGGINS, JR
    COLD SPRING HARBOR SYMPOSIA ON QUANTITATIVE BIOLOGY, 1971, 36 : 585 - +
  • [35] ESTIMATIONS OF FOLD SURFACE DENSITIES USING SPACE-FILLING MODELS
    HARRISON, IR
    JUSKA, T
    BULLETIN OF THE AMERICAN PHYSICAL SOCIETY, 1979, 24 (03): : 478 - 478
  • [36] ESTIMATIONS OF FOLD SURFACE DENSITIES USING SPACE-FILLING MODELS
    HARRISON, IR
    JUSKA, T
    JOURNAL OF POLYMER SCIENCE PART B-POLYMER PHYSICS, 1979, 17 (03) : 491 - 496
  • [37] AN AREA-BASED ALGORITHM FOR CAST SHADOWS ON SPACE-FILLING MOLECULAR-MODELS
    GWILLIAM, M
    MAX, N
    JOURNAL OF MOLECULAR GRAPHICS, 1988, 6 (04): : 214 - 215
  • [38] Image Segmentation Metric-Based Adaptive Method
    Berersky, Oleh
    Pitsun, Oleh
    Batryn, Natalia
    Bererska, Kateryna
    Savka, Nadiya
    Dolynyuk, Taras
    2018 IEEE SECOND INTERNATIONAL CONFERENCE ON DATA STREAM MINING & PROCESSING (DSMP), 2018, : 554 - 557
  • [39] Subsampling and space-filling metrics to test ensemble size for robustness analysis with a demonstration in the Colorado River Basin
    Bonham, Nathan
    Kasprzyk, Joseph
    Zagona, Edith
    Rajagopalan, Balaji
    ENVIRONMENTAL MODELLING & SOFTWARE, 2024, 172
  • [40] Table-based space-filling curve generation
    Wu, Guo-Fu
    Dou, Qiang
    Dou, Wen-Hua
    Guofang Keji Daxue Xuebao/Journal of National University of Defense Technology, 2010, 32 (05): : 75 - 79