Robust structured heterogeneity analysis approach for high-dimensional data

被引:3
|
作者
Sun, Yifan [1 ,2 ]
Luo, Ziye [2 ]
Fan, Xinyan [1 ,2 ]
机构
[1] Renmin Univ China, Ctr Appl Stat, 59 Zhongguancun St, Beijing 100872, Peoples R China
[2] Renmin Univ China, Sch Stat, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
high-dimensional data; overlapping clusters; robustness; subgroup identification; DIVERGING NUMBER; FINITE MIXTURE; REGRESSION; SELECTION; QM;
D O I
10.1002/sim.9414
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Revealing relationships between genes and disease phenotypes is a critical problem in biomedical studies. This problem has been challenged by the heterogeneity of diseases. Patients of a perceived same disease may form multiple subgroups, and different subgroups have distinct sets of important genes. It is hence imperative to discover the latent subgroups and reveal the subgroup-specific important genes. Some heterogeneity analysis methods have been proposed in the recent literature. Despite considerable successes, most of the existing studies are still limited as they cannot accommodate data contamination and ignore the interconnections among genes. Aiming at these shortages, we develop a robust structured heterogeneity analysis approach to identify subgroups, select important genes as well as estimate their effects on the phenotype of interest. Possible data contamination is accommodated by employing the Huber loss function. A sparse overlapping group lasso penalty is imposed to conduct regularization estimation and gene identification, while taking into account the possibly overlapping cluster structure of genes. This approach takes an iterative strategy in the similar spirit of K-means clustering. Simulations demonstrate that the proposed approach outperforms alternatives in revealing the heterogeneity and selecting important genes for each subgroup. The analysis of Cancer Cell Line Encyclopedia data leads to biologically meaningful findings with improved prediction and grouping stability.
引用
收藏
页码:3229 / 3259
页数:31
相关论文
共 50 条
  • [31] A novel LDA approach for high-dimensional data
    Feng, GY
    Hu, DW
    Li, M
    Zhou, ZT
    ADVANCES IN NATURAL COMPUTATION, PT 1, PROCEEDINGS, 2005, 3610 : 209 - 212
  • [32] Robust Covariance Matrix Estimation for High-Dimensional Compositional Data with Application to Sales Data Analysis
    Li, Danning
    Srinivasan, Arun
    Chen, Qian
    Xue, Lingzhou
    JOURNAL OF BUSINESS & ECONOMIC STATISTICS, 2023, 41 (04) : 1090 - 1100
  • [33] Robust high-dimensional screening
    Kim, Aleksandra
    Mutel, Christopher
    Froemelt, Andreas
    ENVIRONMENTAL MODELLING & SOFTWARE, 2022, 148
  • [34] Robust covariance estimation for high-dimensional compositional data with application to microbial communities analysis
    He, Yong
    Liu, Pengfei
    Zhang, Xinsheng
    Zhou, Wang
    STATISTICS IN MEDICINE, 2021, 40 (15) : 3499 - 3515
  • [35] Robust and compact maximum margin clustering for high-dimensional data
    Hakan Cevikalp
    Edward Chome
    Neural Computing and Applications, 2024, 36 : 5981 - 6003
  • [36] Robust PCA for high-dimensional data based on characteristic transformation
    He, Lingyu
    Yang, Yanrong
    Zhang, Bo
    AUSTRALIAN & NEW ZEALAND JOURNAL OF STATISTICS, 2023, 65 (02) : 127 - 151
  • [37] Robust and compact maximum margin clustering for high-dimensional data
    Cevikalp, Hakan
    Chome, Edward
    NEURAL COMPUTING & APPLICATIONS, 2024, 36 (11): : 5981 - 6003
  • [38] Feature-Robust Optimal Transport for High-Dimensional Data
    Petrovich, Mathis
    Liang, Chao
    Sato, Ryoma
    Liu, Yanbin
    Tsai, Yao-Hung Hubert
    Zhu, Linchao
    Yang, Yi
    Salakhutdinov, Ruslan
    Yamada, Makoto
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT V, 2023, 13717 : 291 - 307
  • [39] Robust support vector machine for high-dimensional imbalanced data
    Nakayama, Yugo
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2021, 50 (05) : 1524 - 1540
  • [40] Robust statistical methods for high-dimensional data, with applications in tribology
    Pfeiffer, Pia
    Filzmoser, Peter
    ANALYTICA CHIMICA ACTA, 2023, 1279