What is the Value of Data? on Mathematical Methods for Data Quality Estimation

被引:0
|
作者
Raviv, Netanel [1 ]
Jain, Siddharth [2 ]
Bruck, Jehoshua [2 ]
机构
[1] Washington Univ, Dept Comp Sci & Engn, St Louis, MO 63130 USA
[2] CALTECH, Dept Elect Engn, Pasadena, CA 91125 USA
来源
2020 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT) | 2020年
关键词
D O I
10.1109/isit44484.2020.9174311
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Data is one of the most important assets of the information age, and its societal impact is undisputed. Yet, rigorous methods of assessing the quality of data are lacking. In this paper, we propose a formal definition for the quality of a given dataset. We assess a dataset's quality by a quantity we call the expected diameter, which measures the expected disagreement between two randomly chosen hypotheses that explain it, and has recently found applications in active learning. We focus on Boolean hyperplanes, and utilize a collection of Fourier analytic, algebraic, and probabilistic methods to come up with theoretical guarantees and practical solutions for the computation of the expected diameter. We also study the behaviour of the expected diameter on algebraically structured datasets, conduct experiments that validate this notion of quality, and demonstrate the feasibility of our techniques.
引用
收藏
页码:2825 / 2830
页数:6
相关论文
共 50 条
  • [21] Mathematical Methods for Optimizing Big Data Processing
    Syrotkina, Olena
    Aleksieiev, Mykhailo
    Moroz, Borys
    Matsiuk, Serhii
    Shevtsova, Olga
    Kozlovskyi, Andrii
    2020 10TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER INFORMATION TECHNOLOGIES (ACIT), 2020, : 170 - 176
  • [22] Data dimensionality estimation methods: a survey
    Camastra, F
    PATTERN RECOGNITION, 2003, 36 (12) : 2945 - 2954
  • [23] Quality Assurance: The Value of Data and the Will to Improve
    David R. McCready
    Annals of Surgical Oncology, 2003, 10 : 837 - 838
  • [24] Quality assurance: The value of data and the will to improve
    McCready, DR
    ANNALS OF SURGICAL ONCOLOGY, 2003, 10 (08) : 837 - 838
  • [25] Quality Estimation of Deep Web Data Sources for Data Fusion
    Sun, Ming
    Dou, Huitao
    Li, Qingzhong
    Yan, Zhongmin
    2012 INTERNATIONAL WORKSHOP ON INFORMATION AND ELECTRONICS ENGINEERING, 2012, 29 : 2347 - 2354
  • [26] OPTIMIZATION OF MATHEMATICAL TREATMENT OF REFLECTANCE DATA FOR ESTIMATION OF PROTEIN
    NORRIS, KH
    WILLIAMS, PC
    CEREAL FOODS WORLD, 1977, 22 (09) : 461 - 461
  • [27] What is the value of statistical testing of observational data?
    Jeffery, Nick D.
    Budke, Christine M.
    Chanoit, Guillaume P.
    VETERINARY SURGERY, 2022, 51 (07) : 1043 - 1051
  • [28] Methods for examining data quality in healthcare integrated data repositories
    Huser, Vojtech
    Kahn, Michael G.
    Brown, Jeffrey S.
    Gouripeddi, Ramkiran
    PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018 (PSB), 2018, : 628 - 633
  • [29] Comparison of data quality using three data entry methods
    Forman, S
    Handy, C
    Fuller, C
    Knatterud, GL
    CONTROLLED CLINICAL TRIALS, 2003, 24 : 186S - 186S
  • [30] Data quality analysis using data-mining methods
    Windheuser, U
    OPERATIONS RESEARCH PROCEEDINGS 1999, 2000, : 304 - 310