Encoding of data sets and algorithms

被引:0
|
作者
Doctor, Katarina [1 ]
Mao, Tong [2 ]
Mhaskar, Hrushikesh [2 ]
机构
[1] US Naval Res Lab, Navy Ctr Appl Res AI, Informat Techchol Div, Washington, DC 20375 USA
[2] Claremont Grad Univ, Inst Math Sci, Claremont, CA 91711 USA
关键词
Metric entropy; Covering number; Analytic functions; Entire functions; Entropy of class of functionals;
D O I
10.1016/j.apnum.2023.07.013
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
In many high-impact applications, it is important to ensure the quality of the output of a machine learning algorithm as well as its reliability in comparison to the complexity of the algorithm used. In this paper, we have initiated a mathematically rigorous theory to decide which models (algorithms applied on data sets) are close to each other in terms of certain metrics, such as performance and the complexity level of the algorithm. This involves creating a grid on the hypothetical spaces of data sets and algorithms so as to identify a finite set of probability distributions from which the data sets are sampled and a finite set of algorithms. A given threshold metric acting on this grid will express the nearness (or statistical distance) of each algorithm and data set of interest to any given application. A technically difficult part of this project is to estimate the so-called metric entropy of a compact subset of functions of infinitely many variables that arise in the definition of these spaces. (c) 2023 The Authors. Published by Elsevier B.V. on behalf of IMACS. This is an open access article under the CC BY-NC-ND license (http:// creativecommons .org /licenses /by-nc -nd /4 .0/).
引用
收藏
页码:209 / 235
页数:27
相关论文
共 50 条
  • [1] DATA ENCODING ALGORITHMS IN RESTRUCTURABLE AUTOMATA
    PAVLOV, DI
    PECHENKIN, VA
    PUPYREV, EI
    AUTOMATION AND REMOTE CONTROL, 1988, 49 (02) : 232 - 241
  • [2] Engineering Algorithms for Large Data Sets
    Sanders, Peter
    SOFSEM 2013: Theory and Practice of Computer Science, 2013, 7741 : 29 - 32
  • [3] Expanding Data Encoding Patterns For Quantum Algorithms
    Weigold, Manuela
    Barzen, Johanna
    Leymann, Frank
    Salm, Marie
    2021 IEEE 18TH INTERNATIONAL CONFERENCE ON SOFTWARE ARCHITECTURE COMPANION (ICSA-C), 2021, : 95 - 101
  • [4] Exact and heuristic algorithms for data sets reconstruction
    Lodi, LA
    Vigo, D
    Zannoni, C
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2000, 124 (01) : 139 - 150
  • [5] Clustering Algorithms for Large Temporal Data Sets
    Scepi, Germana
    DATA ANALYSIS AND CLASSIFICATION, 2010, : 369 - 377
  • [6] Bias in Algorithms and the Misuse of Big Data Sets
    Walker H.M.
    ACM Inroads, 2020, 11 (02) : 12 - 17
  • [7] Fuzzy sets for data mining and recommendation algorithms
    Man, Na
    Wang, Kechao
    Liu, Lin
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 38 (04) : 3737 - 3745
  • [8] A method generating data sets to test data mining algorithms
    School of Information Science and Engineering, Northeastern University, Shenyang 110004, China
    Dongbei Daxue Xuebao, 2008, 3 (328-331):
  • [9] Data detection algorithms for multiplexed quantum dot encoding
    Goss, Kelly C.
    Messier, Geoff G.
    Potter, Mike E.
    OPTICS EXPRESS, 2012, 20 (05): : 5762 - 5774
  • [10] Improved niching and encoding strategies for clustering noisy data sets
    Nasraoui, O
    Leon, E
    GENETIC AND EVOLUTIONARY COMPUTATION GECCO 2004 , PT 2, PROCEEDINGS, 2004, 3103 : 1324 - 1325