What is the Value of Data? on Mathematical Methods for Data Quality Estimation

被引:0
|
作者
Raviv, Netanel [1 ]
Jain, Siddharth [2 ]
Bruck, Jehoshua [2 ]
机构
[1] Washington Univ, Dept Comp Sci & Engn, St Louis, MO 63130 USA
[2] CALTECH, Dept Elect Engn, Pasadena, CA 91125 USA
关键词
D O I
10.1109/isit44484.2020.9174311
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Data is one of the most important assets of the information age, and its societal impact is undisputed. Yet, rigorous methods of assessing the quality of data are lacking. In this paper, we propose a formal definition for the quality of a given dataset. We assess a dataset's quality by a quantity we call the expected diameter, which measures the expected disagreement between two randomly chosen hypotheses that explain it, and has recently found applications in active learning. We focus on Boolean hyperplanes, and utilize a collection of Fourier analytic, algebraic, and probabilistic methods to come up with theoretical guarantees and practical solutions for the computation of the expected diameter. We also study the behaviour of the expected diameter on algebraically structured datasets, conduct experiments that validate this notion of quality, and demonstrate the feasibility of our techniques.
引用
收藏
页码:2825 / 2830
页数:6
相关论文
共 50 条
  • [1] What is the value of data? A review of empirical methods
    Coyle, Diane
    Manley, Annabel
    JOURNAL OF ECONOMIC SURVEYS, 2024, 38 (04) : 1317 - 1337
  • [2] What Is the Value of Analytical Data and the Methods We Use?
    Cauvain, Stanley P.
    CEREAL FOODS WORLD, 2015, 60 (01) : 60 - 61
  • [3] Are extreme value estimation methods useful for network data?
    Phyllis Wan
    Tiandong Wang
    Richard A. Davis
    Sidney I. Resnick
    Extremes, 2020, 23 : 171 - 195
  • [4] Are extreme value estimation methods useful for network data?
    Wan, Phyllis
    Wang, Tiandong
    Davis, Richard A.
    Resnick, Sidney I.
    EXTREMES, 2020, 23 (01) : 171 - 195
  • [5] Missing value estimation methods for DNA methylation data
    Di Lena, Pietro
    Sala, Claudia
    Prodi, Andrea
    Nardini, Christine
    BIOINFORMATICS, 2019, 35 (19) : 3786 - 3793
  • [6] Comparison of spatial interpolation methods for the estimation of air quality data
    David W Wong
    Lester Yuan
    Susan A Perlin
    Journal of Exposure Science & Environmental Epidemiology, 2004, 14 : 404 - 415
  • [7] Comparison of spatial interpolation methods for the estimation of air quality data
    Wong, DW
    Yuan, L
    Perlin, SA
    JOURNAL OF EXPOSURE ANALYSIS AND ENVIRONMENTAL EPIDEMIOLOGY, 2004, 14 (05): : 404 - 415
  • [8] Retrospective data, what value?
    Alani, AM
    O'Dwyer, PJ
    BRITISH JOURNAL OF SURGERY, 2003, 90 : 143 - 143
  • [9] Comparison of Estimation Methods for Missing Value Imputation of Gene Expression Data
    Sarikas, Ali
    Odabasioglu, Niyazi
    Altay, Gokmen
    2016 MEDICAL TECHNOLOGIES NATIONAL CONFERENCE (TIPTEKNO), 2015,
  • [10] Missing Value Estimation Methods for Data in Linear Functional Relationship Model
    Ghapor, Adilah Abdul
    Zubairi, Yong Zulina
    Imon, A. H. M. Rahmatullah
    SAINS MALAYSIANA, 2017, 46 (02): : 317 - 326