High-dimensional changepoint estimation with heterogeneous missingness

被引:6
|
作者
Follain, Bertille [1 ,2 ]
Wang, Tengyao [3 ,4 ]
Samworth, Richard J. [1 ]
机构
[1] Univ Cambridge, Stat Lab, Cambridge, Cambs, England
[2] PSL Res Univ, INRIA, Ecole Normale Super, Paris, France
[3] London Sch Econ & Polit Sci, Dept Stat, London, England
[4] UCL, Dept Stat Sci, London, England
基金
欧洲研究理事会; 英国工程与自然科学研究理事会;
关键词
changepoint estimation; high-dimensional data; missing data; segmentation; sparsity; CHANGE-POINT DETECTION; MAXIMUM-LIKELIHOOD-ESTIMATION; BINARY SEGMENTATION; TIME-SERIES; SPARSE; NUMBER;
D O I
10.1111/rssb.12540
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We propose a new method for changepoint estimation in partially observed, high-dimensional time series that undergo a simultaneous change in mean in a sparse subset of coordinates. Our first methodological contribution is to introduce a 'MissCUSUM' transformation (a generalisation of the popular cumulative sum statistics), that captures the interaction between the signal strength and the level of missingness in each coordinate. In order to borrow strength across the coordinates, we propose to project these MissCUSUM statistics along a direction found as the solution to a penalised optimisation problem tailored to the specific sparsity structure. The changepoint can then be estimated as the location of the peak of the absolute value of the projected univariate series. In a model that allows different missingness probabilities in different component series, we identify that the key interaction between the missingness and the signal is a weighted sum of squares of the signal change in each coordinate, with weights given by the observation probabilities. More specifically, we prove that the angle between the estimated and oracle projection directions, as well as the changepoint location error, are controlled with high probability by the sum of two terms, both involving this weighted sum of squares, and representing the error incurred due to noise and the error due to missingness respectively. A lower bound confirms that our changepoint estimator, which we call MissInspect, is optimal up to a logarithmic factor. The striking effectiveness of the MissInspect methodology is further demonstrated both on simulated data, and on an oceanographic data set covering the Neogene period.
引用
收藏
页码:1023 / 1055
页数:33
相关论文
共 50 条
  • [1] High-dimensional principal component analysis with heterogeneous missingness
    Zhu, Ziwei
    Wang, Tengyao
    Samworth, Richard J.
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2022, 84 (05) : 2000 - 2031
  • [2] High-dimensional, multiscale online changepoint detection
    Chen, Yudong
    Wang, Tengyao
    Samworth, Richard J.
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2022, 84 (01) : 234 - 266
  • [3] Inference in High-Dimensional Online Changepoint Detection
    Chen, Yudong
    Wang, Tengyao
    Samworth, Richard J.
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2024, 119 (546) : 1461 - 1472
  • [4] High-dimensional changepoint detection via a geometrically inspired mapping
    Grundy, Thomas
    Killick, Rebecca
    Mihaylov, Gueorgui
    STATISTICS AND COMPUTING, 2020, 30 (04) : 1155 - 1166
  • [5] High-dimensional changepoint detection via a geometrically inspired mapping
    Thomas Grundy
    Rebecca Killick
    Gueorgui Mihaylov
    Statistics and Computing, 2020, 30 : 1155 - 1166
  • [6] Heterogeneous robust estimation with the mixed penalty in high-dimensional regression model
    Zhu, Yanling
    Wang, Kai
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2024, 53 (08) : 2730 - 2743
  • [7] Converting high-dimensional regression to high-dimensional conditional density estimation
    Izbicki, Rafael
    Lee, Ann B.
    ELECTRONIC JOURNAL OF STATISTICS, 2017, 11 (02): : 2800 - 2831
  • [8] Online data-driven changepoint detection for high-dimensional dynamical systems
    Lin, Sen
    Mengaldo, Gianmarco
    Maulik, Romit
    CHAOS, 2023, 33 (10)
  • [9] THE EFFECT OF ESTIMATION IN HIGH-DIMENSIONAL PORTFOLIOS
    Gandy, Axel
    Veraart, Luitgard A. M.
    MATHEMATICAL FINANCE, 2013, 23 (03) : 531 - 559
  • [10] Integrating granular computing with density estimation for anomaly detection in high-dimensional heterogeneous data
    Chen, Baiyang
    Yuan, Zhong
    Peng, Dezhong
    Chen, Xiaoliang
    Chen, Hongmei
    Chen, Yingke
    INFORMATION SCIENCES, 2025, 690