What is the Value of Data? on Mathematical Methods for Data Quality Estimation

被引:0
|
作者
Raviv, Netanel [1 ]
Jain, Siddharth [2 ]
Bruck, Jehoshua [2 ]
机构
[1] Washington Univ, Dept Comp Sci & Engn, St Louis, MO 63130 USA
[2] CALTECH, Dept Elect Engn, Pasadena, CA 91125 USA
来源
2020 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT) | 2020年
关键词
D O I
10.1109/isit44484.2020.9174311
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Data is one of the most important assets of the information age, and its societal impact is undisputed. Yet, rigorous methods of assessing the quality of data are lacking. In this paper, we propose a formal definition for the quality of a given dataset. We assess a dataset's quality by a quantity we call the expected diameter, which measures the expected disagreement between two randomly chosen hypotheses that explain it, and has recently found applications in active learning. We focus on Boolean hyperplanes, and utilize a collection of Fourier analytic, algebraic, and probabilistic methods to come up with theoretical guarantees and practical solutions for the computation of the expected diameter. We also study the behaviour of the expected diameter on algebraically structured datasets, conduct experiments that validate this notion of quality, and demonstrate the feasibility of our techniques.
引用
收藏
页码:2825 / 2830
页数:6
相关论文
共 50 条
  • [41] Data science and machine learning: Mathematical and statistical methods
    Lai, Yin-Ju
    Hsiao, Chuhsing Kate
    Botev, Zdravko
    BIOMETRICS, 2021, 77 (04) : 1503 - 1504
  • [42] MATHEMATICAL METHODS TO ASSURE CONFIDENTIALITY AND ANONYMITY OF RESEARCH DATA
    BORUCH, RF
    ENDRUWEI.G
    ZEITSCHRIFT FUR SOZIOLOGIE, 1973, 2 (03): : 227 - 238
  • [43] Transportability of data between electronic noses: mathematical methods
    Balaban, MO
    Korel, F
    Odabasi, AZ
    Folkes, G
    SENSORS AND ACTUATORS B-CHEMICAL, 2000, 71 (03) : 203 - 211
  • [44] COMPARISON OF VARIOUS MATHEMATICAL METHODS FOR CALCULATION OF RADIOIMMUNOASSAY DATA
    HERNDL, R
    MARSCHNER, I
    ACTA ENDOCRINOLOGICA, 1975, 78 : 117 - 117
  • [45] A new flexible Weibull extension model: Different estimation methods and modeling an extreme value data
    Alshanbari, Huda M.
    Odhah, Omalsad Hamood
    Al-Mofleh, Hazem
    Ahmad, Zubair
    Khosa, Saima K.
    El-Bagoury, Abd al-Aziz Hosni
    HELIYON, 2023, 9 (11)
  • [46] The Value of Privacy: What Does the Personal Data Mean to the Data Subject and Businesses?
    Serban, Andreea
    CURRENT ISSUES IN BUSINESS LAW, 2018, : 116 - 125
  • [47] What to believe: Bayesian methods for data analysis
    Kruschke, John K.
    TRENDS IN COGNITIVE SCIENCES, 2010, 14 (07) : 293 - 300
  • [48] Data Value Estimation for Privacy-Preserving Big/Personal Data Businesses
    Kiyomoto, Shinsaku
    APPLICATIONS + PRACTICAL CONCEPTUALIZATION + MATHEMATICS = FRUITFUL INNOVATION, 2016, 11 : 149 - 158
  • [49] A Mathematical Framework for Data Quality Management in Enterprise Systems
    Bai, Xue
    INFORMS JOURNAL ON COMPUTING, 2012, 24 (04) : 648 - 664
  • [50] Advances in Time Estimation Methods for Molecular Data
    Kumar, Sudhir
    Hedges, S. Blair
    MOLECULAR BIOLOGY AND EVOLUTION, 2016, 33 (04) : 863 - 869