MIXTURE CONDITIONAL REGRESSION WITH ULTRAHIGH

被引:0
|
作者
Shi, Jiaxin [1 ]
Wang, Fang [2 ]
Gao, Yuan [1 ]
Song, Xiaojun [3 ,4 ]
Wang, Hansheng [1 ]
机构
[1] Peking Univ, Guanghua Sch Management, Beijing, Peoples R China
[2] Shandong Univ, Data Sci Inst, Jinan, Peoples R China
[3] Peking Univ, Guanghua Sch Management, Beijing, Peoples R China
[4] Peking Univ, Ctr Stat Sci, Beijing, Peoples R China
来源
ANNALS OF APPLIED STATISTICS | 2024年 / 18卷 / 03期
基金
中国国家自然科学基金;
关键词
Key words and phrases. Expectation-maximization algorithm; judicial impartiality; mixture conditional re-; gression; na & iuml; ve Bayes model; ultrahigh dimensional data; MAXIMUM-LIKELIHOOD; EM ALGORITHM; MODEL; SELECTION;
D O I
10.1214/24-AOAS1893
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Testing judicial impartiality is a problem of fundamental importance in empirical legal studies for which standard regression methods have been popularly used to estimate the extralegal factor effects. However, those methods cannot handle control variables with ultrahigh dimensionality, such as those found in judgment documents recorded in text format. To solve this problem, we develop a novel mixture conditional regression (MCR) approach, assuming that the whole sample can be classified into a number of latent classes. Within each latent class, a standard linear regression model can be used to model the relationship between the response and a key feature vector, which is assumed to be of a fixed dimension. Meanwhile, ultrahigh dimensional control variables are then used to determine the latent class membership, where a na & iuml;ve Bayes type model is used to describe the relationship. Hence, the dimension of control variables is allowed to be arbitrarily high. A novel expectation-maximization algorithm is developed for model estimation. Therefore, we are able to estimate the key parameters of interest as efficiently as if the true class membership were known in advance. Simulation studies are presented to demonstrate the proposed MCR method. A real dataset of Chinese burglary offenses is analyzed for illustration purposes.
引用
收藏
页码:2532 / 2550
页数:19
相关论文
共 50 条
  • [21] Conditional Logistic Regression With Survey Data
    Graubard, Barry I.
    Korn, Edward L.
    STATISTICS IN BIOPHARMACEUTICAL RESEARCH, 2011, 3 (02): : 398 - 408
  • [22] SYMBOLIC REGRESSION OF CONDITIONAL TARGET EXPRESSIONS
    Korns, Michael F.
    GENETIC PROGRAMMING THEORY AND PRACTICE VII, 2010, : 211 - 228
  • [23] Missing observations in regression: a conditional approach
    Battey, H. S.
    Cox, D. R.
    ROYAL SOCIETY OPEN SCIENCE, 2023, 10 (02):
  • [24] Conditional density estimation in a regression setting
    Efromovich, Sam
    ANNALS OF STATISTICS, 2007, 35 (06): : 2504 - 2535
  • [25] Conditional prediction intervals for linear regression
    McCullagh, Peter
    Vovk, Vladimir
    Nouretdinov, Ilia
    Devetyarov, Dmitry
    Gammerman, Alex
    EIGHTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS, 2009, : 131 - +
  • [26] On conditional variance estimation in nonparametric regression
    Siddhartha Chib
    Edward Greenberg
    Statistics and Computing, 2013, 23 : 261 - 270
  • [27] Conditional distribution regression for functional responses
    Fan, Jianing
    Muller, Hans-Georg
    SCANDINAVIAN JOURNAL OF STATISTICS, 2022, 49 (02) : 502 - 524
  • [28] On conditional variance estimation in nonparametric regression
    Chib, Siddhartha
    Greenberg, Edward
    STATISTICS AND COMPUTING, 2013, 23 (02) : 261 - 270
  • [29] Robust Mixture Regression Based on the Mixture of Slash Distributions
    Saboori, Hadi
    Barmalzan, Ghobad
    Doostparast, Mandi
    JOURNAL OF STATISTICAL THEORY AND APPLICATIONS, 2020, 19 (02): : 118 - 132
  • [30] Robust Mixture Regression Based on the Mixture of Slash Distributions
    Hadi Saboori
    Ghobad Barmalzan
    Mahdi Doostparast
    Journal of Statistical Theory and Applications, 2020, 19 : 118 - 132