A Group Feature Screening Procedure Based on Pearson Chi-Square Statistic for Biology Data with Categorical Response

被引:0
|
作者
He, Hanji [1 ]
He, Jianfeng [2 ]
Deng, Guangming [1 ,3 ]
机构
[1] Guilin Univ Technol, Sch Math & Stat, Guilin, Peoples R China
[2] South China Univ Technol, Sch Econ & Finance, Guangzhou, Peoples R China
[3] Guangxi Coll & Univ Key Lab Appl Stat, Guilin, Peoples R China
基金
中国国家自然科学基金;
关键词
REGRESSION;
D O I
10.1155/2024/9014764
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
The analysis of biogenetic data makes an important contribution to the understanding of disease mechanisms and the diagnosis of rare diseases. In this analysis, the selection of significant features affecting the disease provides an effective basis for subsequent disease judgment and treatment direction. However, this is not a simple task as biogenetic data have challenges such as ultra-high dimensionality of potential features, imbalance of response variables, and genetic associations. This study focuses on the group structure in feature screening with biogenetic data. Specifically, group structure exists for biogenetic data, so we need to analyze the entire genome rather than individual strongly correlated genes. This study proposes a group feature screening method that considers group correlations using adjusted Pearson's cardinality statistic to address this issue. The method can be applied to both continuous and discrete covariates. The performance of the proposed method is illustrated by simulation studies, where the proposed method performs well with imbalanced data and multicategorical responses. In the application of lung cancer diagnosis, the proposed method for imbalanced data categorization is impressive, and the dimension reduction using linear discriminant is still good.
引用
收藏
页数:21
相关论文
共 50 条
  • [1] Pearson-Fisher Chi-Square Statistic Revisited
    Bolboaca, Sorana D.
    Jantschi, Lorentz
    Sestras, Adriana F.
    Sestras, Radu E.
    Pamfil, Doru C.
    INFORMATION, 2011, 2 (03) : 528 - 545
  • [2] Hadamard matrices, quaternions, and the Pearson chi-square statistic
    Alhakim, Abbas
    STATISTICAL PAPERS, 2024, 65 (08) : 5273 - 5291
  • [3] Adjusted Pearson Chi-Square feature screening for multi-classification with ultrahigh dimensional data
    Ni, Lyu
    Fang, Fang
    Wan, Fangjiao
    METRIKA, 2017, 80 (6-8) : 805 - 828
  • [4] Adjusted Pearson Chi-Square feature screening for multi-classification with ultrahigh dimensional data
    Lyu Ni
    Fang Fang
    Fangjiao Wan
    Metrika, 2017, 80 : 805 - 828
  • [5] Model-free feature screening for ultrahigh dimensional data via a Pearson chi-square based index
    Ma, Weidong
    Xiao, Jingsong
    Yang, Ying
    Ye, Fei
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2022, 92 (15) : 3222 - 3248
  • [6] AGGREGATION AND THE PEARSON CHI-SQUARE STATISTIC FOR HOMOGENEOUS PROPORTIONS AND DISTRIBUTIONS IN ECOLOGY
    GARSON, GI
    MOSER, EB
    ECOLOGY, 1995, 76 (07) : 2258 - 2269
  • [7] A data discretization algorithm based on improved chi-square statistic
    Sang, Yu
    Li, Ke-Qiu
    Yan, De-Qin
    Dalian Ligong Daxue Xuebao/Journal of Dalian University of Technology, 2012, 52 (03): : 443 - 447
  • [8] IMPUTATION PROCEDURES FOR CATEGORICAL-DATA - THEIR EFFECTS ON THE GOODNESS-OF-FIT CHI-SQUARE STATISTIC
    GIMOTTY, PA
    BROWN, MB
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 1990, 19 (02) : 681 - 703
  • [9] AN EXTENDED GFFIT STATISTIC DEFINED ON ORTHOGONAL COMPONENTS OF PEARSON'S CHI-SQUARE
    Reiser, Mark
    Cagnone, Silvia
    Zhu, Junfei
    PSYCHOMETRIKA, 2023, 88 (01) : 208 - 240
  • [10] Partitions of Pearson’s Chi-square statistic for frequency tables: a comprehensive account
    Sébastien Loisel
    Yoshio Takane
    Computational Statistics, 2016, 31 : 1429 - 1452