Robust structured heterogeneity analysis approach for high-dimensional data

被引:3
|
作者
Sun, Yifan [1 ,2 ]
Luo, Ziye [2 ]
Fan, Xinyan [1 ,2 ]
机构
[1] Renmin Univ China, Ctr Appl Stat, 59 Zhongguancun St, Beijing 100872, Peoples R China
[2] Renmin Univ China, Sch Stat, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
high-dimensional data; overlapping clusters; robustness; subgroup identification; DIVERGING NUMBER; FINITE MIXTURE; REGRESSION; SELECTION; QM;
D O I
10.1002/sim.9414
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Revealing relationships between genes and disease phenotypes is a critical problem in biomedical studies. This problem has been challenged by the heterogeneity of diseases. Patients of a perceived same disease may form multiple subgroups, and different subgroups have distinct sets of important genes. It is hence imperative to discover the latent subgroups and reveal the subgroup-specific important genes. Some heterogeneity analysis methods have been proposed in the recent literature. Despite considerable successes, most of the existing studies are still limited as they cannot accommodate data contamination and ignore the interconnections among genes. Aiming at these shortages, we develop a robust structured heterogeneity analysis approach to identify subgroups, select important genes as well as estimate their effects on the phenotype of interest. Possible data contamination is accommodated by employing the Huber loss function. A sparse overlapping group lasso penalty is imposed to conduct regularization estimation and gene identification, while taking into account the possibly overlapping cluster structure of genes. This approach takes an iterative strategy in the similar spirit of K-means clustering. Simulations demonstrate that the proposed approach outperforms alternatives in revealing the heterogeneity and selecting important genes for each subgroup. The analysis of Cancer Cell Line Encyclopedia data leads to biologically meaningful findings with improved prediction and grouping stability.
引用
收藏
页码:3229 / 3259
页数:31
相关论文
共 50 条
  • [41] A general family of trimmed estimators for robust high-dimensional data
    Yang, Eunho
    Lozano, Aurelie C.
    Aravkin, Aleksandr
    ELECTRONIC JOURNAL OF STATISTICS, 2018, 12 (02): : 3519 - 3553
  • [42] Inferring gene regulatory relationships with a high-dimensional robust approach
    Zang, Yangguang
    Zhao, Qing
    Zhang, Qingzhao
    Li, Yang
    Zhang, Sanguo
    Ma, Shuangge
    GENETIC EPIDEMIOLOGY, 2017, 41 (05) : 437 - 454
  • [43] A Robust Supervised Variable Selection for Noisy High-Dimensional Data
    Kalina, Jan
    Schlenker, Anna
    BIOMED RESEARCH INTERNATIONAL, 2015, 2015
  • [44] Robust High-Dimensional Regression with Coefficient Thresholding and Its Application to Imaging Data Analysis
    Liu, Bingyuan
    Zhang, Qi
    Xue, Lingzhou
    Song, Peter X. -K.
    Kang, Jian
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2024, 119 (545) : 715 - 729
  • [45] An adaptive approach for testing high-dimensional location parameters with structured correlations
    Liu, Yanhong
    Zhao, Ping
    Feng, Long
    Wang, Zhaojun
    JOURNAL OF THE KOREAN STATISTICAL SOCIETY, 2025,
  • [46] High-Dimensional Structured Quantile Regression
    Sivakumar, Vidyashankar
    Banerjee, Arindam
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [47] Integrative learning of structured high-dimensional data from multiple datasets
    Chang, Changgee
    Dai, Zongyu
    Oh, Jihwan
    Long, Qi
    STATISTICAL ANALYSIS AND DATA MINING, 2023, 16 (02) : 120 - 134
  • [48] Data structures and algorithms for high-dimensional structured adaptive mesh refinement
    Grandin, Magnus
    ADVANCES IN ENGINEERING SOFTWARE, 2015, 82 : 75 - 86
  • [49] Regression-based heterogeneity analysis to identify overlapping subgroup structure in high-dimensional data
    Luo, Ziye
    Yao, Xinyue
    Sun, Yifan
    Fan, Xinyan
    BIOMETRICAL JOURNAL, 2022, 64 (06) : 1109 - 1141
  • [50] FEATURE SELECTION FOR HIGH-DIMENSIONAL DATA ANALYSIS
    Verleysen, Michel
    NCTA 2011: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON NEURAL COMPUTATION THEORY AND APPLICATIONS, 2011, : IS23 - IS25