Testing Identity of Multidimensional Histograms

被引:0
|
作者
Diakonikolas, Ilias [1 ]
Kane, Daniel M. [2 ]
Peebles, John [3 ]
机构
[1] Univ Southern Calif, Los Angeles, CA 90007 USA
[2] Univ Calif San Diego, La Jolla, CA USA
[3] MIT, Cambridge, MA USA
来源
关键词
distribution testing; hypothesis testing; goodness of fit; multivariate histograms; MULTIVARIATE HISTOGRAMS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We investigate the problem of identity testing for multidimensional histogram distributions. A distribution p : D -> R+, where D subset of R-d, is called a k -histogram if there exists a partition of the domain into k axis-aligned rectangles such that p is constant within each such rectangle. Histograms are one of the most fundamental nonparametric families of distributions and have been extensively studied in computer science and statistics. We give the first identity tester for this problem with sub-learning sample complexity in any fixed dimension and a nearly-matching sample complexity lower bound. In more detail, let q be an unknown d-dimensional k -histogram distribution in fixed dimension d, and p be an explicitly given d-dimensional k -histogram. We want to correctly distinguish, with probability at least 2/3, between the case that p = q versus ||p - q||(1) >= epsilon. We design an algorithm for this hypothesis testing problem with sample complexity O ((root k/epsilon(2))2(d/2) log(2:5d) (k/epsilon)) that runs in sample-polynomial time. Our algorithm is robust to model misspecification, i.e., succeeds even if q is only promised to be close to a k-histogram. Moreover, for k = 2(Omega(d),) we show a sample complexity lower bound of (root k/epsilon(2))center dot Omega(log(k)/d)(d-1) when d >= 2. That is, for any fixed dimension d, our upper and lower bounds are nearly matching. Prior to our work, the sample complexity of the d = 1 case was well-understood, but no algorithm with sub-learning sample complexity was known, even for d = 2. Our new upper and lower bounds have interesting conceptual implications regarding the relation between learning and testing in this setting.
引用
收藏
页数:25
相关论文
共 50 条
  • [1] Maintenance of multidimensional histograms
    Muthukrishnan, S
    Strauss, M
    FST TCS 2003: FOUNDATIONS OF SOFTWARE TECHNOLOGY AND THEORETICAL COMPUTER SCIENCE, 2003, 2914 : 352 - 362
  • [2] Multidimensional histograms for density modification
    Zhang, KYJ
    MACROMOLECULAR CRYSTALLOGRAPHY, PT D, 2003, 374 : 188 - 203
  • [3] Reduced multidimensional texture histograms
    Valkealahti, K
    Oja, E
    SCIA '97 - PROCEEDINGS OF THE 10TH SCANDINAVIAN CONFERENCE ON IMAGE ANALYSIS, VOLS 1 AND 2, 1997, : 923 - 930
  • [4] A DISTANCE METRIC FOR MULTIDIMENSIONAL HISTOGRAMS
    WERMAN, M
    PELEG, S
    ROSENFELD, A
    COMPUTER VISION GRAPHICS AND IMAGE PROCESSING, 1985, 32 (03): : 328 - 336
  • [5] Testing the construct validity of scores on the Multidimensional Inventory of Black Identity
    Cokley, KO
    Helm, K
    MEASUREMENT AND EVALUATION IN COUNSELING AND DEVELOPMENT, 2001, 34 (02) : 80 - 95
  • [6] Summary grids: Building accurate multidimensional histograms
    Furtado, P
    Madeira, H
    6TH INTERNATIONAL CONFERENCE ON DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PROCEEDINGS, 1999, : 187 - 194
  • [7] Reduced multidimensional histograms in color texture description
    Valkealahti, K
    Oja, E
    FOURTEENTH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1 AND 2, 1998, : 1057 - 1061
  • [8] Vmhist:: Efficient multidimensional histograms with improved accuracy
    Furtado, P
    Madeira, H
    DATA WAREHOUSING AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2000, 1874 : 431 - 436
  • [9] MULTIDIMENSIONAL HISTOGRAMS IN A PRINCIPAL COMPONENT ANALYSIS OF MULTIBAND IMAGES
    ALFEROV, GA
    SOVIET JOURNAL OF REMOTE SENSING, 1990, 6 (05): : 826 - 834
  • [10] Mixture clustering using multidimensional histograms for skin detection
    Fu, ZY
    Yang, JF
    Hu, WM
    Tan, TN
    PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 4, 2004, : 549 - 552