Robust structured heterogeneity analysis approach for high-dimensional data

被引:3
|
作者
Sun, Yifan [1 ,2 ]
Luo, Ziye [2 ]
Fan, Xinyan [1 ,2 ]
机构
[1] Renmin Univ China, Ctr Appl Stat, 59 Zhongguancun St, Beijing 100872, Peoples R China
[2] Renmin Univ China, Sch Stat, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
high-dimensional data; overlapping clusters; robustness; subgroup identification; DIVERGING NUMBER; FINITE MIXTURE; REGRESSION; SELECTION; QM;
D O I
10.1002/sim.9414
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Revealing relationships between genes and disease phenotypes is a critical problem in biomedical studies. This problem has been challenged by the heterogeneity of diseases. Patients of a perceived same disease may form multiple subgroups, and different subgroups have distinct sets of important genes. It is hence imperative to discover the latent subgroups and reveal the subgroup-specific important genes. Some heterogeneity analysis methods have been proposed in the recent literature. Despite considerable successes, most of the existing studies are still limited as they cannot accommodate data contamination and ignore the interconnections among genes. Aiming at these shortages, we develop a robust structured heterogeneity analysis approach to identify subgroups, select important genes as well as estimate their effects on the phenotype of interest. Possible data contamination is accommodated by employing the Huber loss function. A sparse overlapping group lasso penalty is imposed to conduct regularization estimation and gene identification, while taking into account the possibly overlapping cluster structure of genes. This approach takes an iterative strategy in the similar spirit of K-means clustering. Simulations demonstrate that the proposed approach outperforms alternatives in revealing the heterogeneity and selecting important genes for each subgroup. The analysis of Cancer Cell Line Encyclopedia data leads to biologically meaningful findings with improved prediction and grouping stability.
引用
收藏
页码:3229 / 3259
页数:31
相关论文
共 50 条
  • [1] Robust analysis of cancer heterogeneity for high-dimensional data
    Cheng, Chao
    Feng, Xingdong
    Li, Xiaoguang
    Wu, Mengyun
    STATISTICS IN MEDICINE, 2022, 41 (27) : 5448 - 5462
  • [2] Software Tools for Robust Analysis of High-Dimensional Data
    Todorov, Valentin
    Filzmoser, Peter
    AUSTRIAN JOURNAL OF STATISTICS, 2014, 43 (04) : 255 - 266
  • [3] Robust regularized cluster analysis for high-dimensional data
    Kalina, Jan
    Vlckova, Katarina
    MATHEMATICAL METHODS IN ECONOMICS (MME 2014), 2014, : 378 - 383
  • [4] Robust PCA for high-dimensional data
    Hubert, M
    Rousseeuw, PJ
    Verboven, S
    DEVELOPMENTS IN ROBUST STATISTICS, 2003, : 169 - 179
  • [5] A Data-dependent Approach for High-dimensional (Robust) Wasserstein Alignment
    Ding H.
    Liu W.
    Ye M.
    ACM Journal of Experimental Algorithmics, 2023, 28 (1-2):
  • [6] Fast Robust Correlation for High-Dimensional Data
    Raymaekers, Jakob
    Rousseeuw, Peter J.
    TECHNOMETRICS, 2021, 63 (02) : 184 - 198
  • [7] Robust Ridge Regression for High-Dimensional Data
    Maronna, Ricardo A.
    TECHNOMETRICS, 2011, 53 (01) : 44 - 53
  • [8] Structured analysis of the high-dimensional FMR model
    Liu, Mengque
    Zhang, Qingzhao
    Fang, Kuangnan
    Ma, Shuangge
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2020, 144
  • [9] Robust high-dimensional regression for data with anomalous responses
    Mingyang Ren
    Sanguo Zhang
    Qingzhao Zhang
    Annals of the Institute of Statistical Mathematics, 2021, 73 : 703 - 736
  • [10] On Coupling Robust Estimation with Regularization for High-Dimensional Data
    Kalina, Jan
    Hlinka, Jaroslav
    DATA SCIENCE: INNOVATIVE DEVELOPMENTS IN DATA ANALYSIS AND CLUSTERING, 2017, : 15 - 27