SMOTEBoost for Regression: Improving the Prediction of Extreme Values

被引:25
|
作者
Moniz, Nuno [1 ]
Ribeiro, Rita P. [1 ]
Cerqueira, Vitor [1 ]
Chawla, Nitesh [2 ]
机构
[1] Univ Porto, INESC TEC, Porto, Portugal
[2] Univ Notre Dame, Indiana, PA USA
关键词
Imbalanced Domain Learning; Ensemble Learning; Boosting; Regression;
D O I
10.1109/DSAA.2018.00025
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Supervised learning with imbalanced domains is one of the biggest challenges in machine learning. Such tasks differ from standard learning tasks by assuming a skewed distribution of target variables, and user domain preference towards under-represented cases. Most research has focused on imbalanced classification tasks, where a wide range of solutions has been tested. Still, little work has been done concerning imbalanced regression tasks. In this paper, we propose an adaptation of the SMOTEBoost approach for the problem of imbalanced regression. Originally designed for classification tasks, it combines boosting methods and the SMOTE resampling strategy. We present four variants of SMOTEBoost and provide an experimental evaluation using 30 datasets with an extensive analysis of results in order to assess the ability of SMOTEBoost methods in predicting extreme target values, and their predictive trade-off concerning baseline boosting methods. SMOTEBoost is publicly available in a software package.
引用
收藏
页码:150 / 159
页数:10
相关论文
共 50 条
  • [22] Ensemble approach for improving prediction in kernel regression and classification
    Han, Sunwoo
    Hwang, Seongyun
    Lee, Seokho
    COMMUNICATIONS FOR STATISTICAL APPLICATIONS AND METHODS, 2016, 23 (04) : 355 - 362
  • [23] Improving Prediction Accuracy for Logistic Regression On Imbalanced Datasets
    Zhang, Hao
    Li, Zhuolin
    Shahriar, Hossain
    Tao, Lixin
    Bhattacharya, Prabir
    Qian, Ying
    2019 IEEE 43RD ANNUAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE (COMPSAC), VOL 1, 2019, : 918 - 919
  • [24] Improving wheat yield prediction through variable selection using Support Vector Regression, Random Forest, and Extreme Gradient Boosting
    Sanchez, Juan Carlos Moreno
    Mesa, Hector Gabriel Acosta
    Espinosa, Adrian Trueba
    Castilla, Sergio Ruiz
    Lamont, Farid Garcia
    SMART AGRICULTURAL TECHNOLOGY, 2025, 10
  • [25] Extreme values identification in regression using a peaks-over-threshold approach
    Wong, Tong Siu Tung
    Li, Wai Keung
    JOURNAL OF APPLIED STATISTICS, 2015, 42 (03) : 566 - 576
  • [26] A prediction-based alternative to P values in regression models
    Lu, Min
    Ishwaran, Hemant
    JOURNAL OF THORACIC AND CARDIOVASCULAR SURGERY, 2018, 155 (03): : 1130 - +
  • [27] Approximated perfect values in logistic regression for prediction and outlier detection
    Artis, M
    Ayuso, M
    Guillen, M
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2003, 32 (04) : 841 - 850
  • [28] Improving extreme offshore wind speed prediction by using deconvolution
    Gaidai, Oleg
    Xing, Yihan
    Balakrishna, Rajiv
    Xu, Jingxiang
    HELIYON, 2023, 9 (02)
  • [29] Extreme regression
    LeBlanc, M
    Moon, J
    Kooperberg, C
    BIOSTATISTICS, 2006, 7 (01) : 71 - 84
  • [30] Prognostic prediction in acute heart failure patients with extreme BNP values
    Lourenco, Patricia
    Ribeiro, Ana
    Pintalhao, Mariana
    Cunha, Filipe M.
    Pereira, Joana
    Marques, Pedro
    Vilaca, Joao Pedro
    Amorim, Marta
    Silva, Sergio
    Bettencourt, Paulo
    BIOMARKERS, 2017, 22 (08) : 715 - 722