A parallel feature selection method based on NMI-XGBoost and distance correlation for typhoon trajectory prediction

被引:3
|
作者
Qiao, Baiyou [1 ]
Wu, Jiaqi [1 ]
Wang, Rui [2 ]
Hao, Yuanqing [1 ]
Wang, Peirui [1 ]
Han, Donghong [1 ]
Wu, Gang [1 ]
机构
[1] Northeastern Univ, Sch Comp Sci & Engn, Shenyang 110819, Peoples R China
[2] Chinese Acad Sci, Shenyang Inst Automat, Shenyang 110169, Peoples R China
来源
JOURNAL OF SUPERCOMPUTING | 2024年 / 80卷 / 08期
基金
中国国家自然科学基金;
关键词
Feature selection; NMI; XGBoost; Distance correlation; Spark; ASSOCIATION; DEPENDENCE; MODEL;
D O I
10.1007/s11227-023-05863-3
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Typhoon trajectory related data involve many factors, such as atmospheric factors, oceanic factors, and physical factors. It has the characteristics of high dimension, strong spatio-temporal correlation, and nonlinear correlation, which increases the difficulty of typhoon trajectory prediction. Using feature selection approaches to select appropriate prediction factors becomes an important means to reduce the dimension of typhoon trajectory related data and improve the performance and accuracy of typhoon trajectory prediction methods. However, the existing feature selection methods based on linear correlation analysis cannot well depict the nonlinear correlation between data features, which results in low accuracy of feature selection. The feature selection methods based on nonlinear correlation analysis are computationally expensive, which affects the timeliness of feature selection. To solve the problem, we propose a parallel feature selection method NX-Spark-DC based on the Spark platform for typhoon trajectory related data. The method firstly filters out the redundant features of typhoon related data by normalized mutual information (NMI) method, subsequently eliminates the useless features by XGBoost machine learning model, and thus reducing the dimension of typhoon related data. On this basis, an improved Spark-based parallel distance correlation algorithm (Spark-DC) is proposed to select the feature combinations with strong correlation. A series of experimental results show that NX-Spark-DC method has high execution efficiency and accuracy, which is significantly better than the existing methods.
引用
收藏
页码:11293 / 11321
页数:29
相关论文
共 50 条
  • [31] An evolutionary deep learning model based on XGBoost feature selection and Gaussian data augmentation for AQI prediction
    Qian, Shijie
    Peng, Tian
    Tao, Zihan
    Li, Xi
    Nazir, Muhammad Shahzad
    Zhang, Chu
    PROCESS SAFETY AND ENVIRONMENTAL PROTECTION, 2024, 191 : 836 - 851
  • [32] Prediction for Rational Synthesis Based on Weighted Feature Selection Method
    Qi, Miao
    Li, Jinsong
    Wang, Jianzhong
    Lu, Yinghua
    Kong, Jun
    MOLECULAR INFORMATICS, 2013, 32 (9-10) : 765 - 774
  • [33] An AIS Based Feature Selection Method For Software Fault Prediction
    Soleimani, A.
    Asdaghi, F.
    2014 IRANIAN CONFERENCE ON INTELLIGENT SYSTEMS (ICIS), 2014,
  • [34] Prediction Method of Rock Uniaxial Compressive Strength Based on Feature Optimization and SSA-XGBoost
    Xie, Huihui
    Lin, Peng
    Kang, Jintao
    Zhai, Chenyu
    Du, Yuchao
    SUSTAINABILITY, 2024, 16 (19)
  • [35] Optimized Defect Prediction Model Using Statistical Process Control and Correlation-Based Feature Selection Method
    Nanditha, J.
    Sruthi, K. N.
    Ashok, Sreeja
    Judy, M. V.
    INTELLIGENT SYSTEMS TECHNOLOGIES AND APPLICATIONS, VOL 1, 2016, 384 : 355 - 366
  • [36] A Windowed Correlation-Based Feature Selection Method to Improve Time Series Prediction of Dengue Fever Cases
    Ferdousi, Tanvir
    Cohnstaedt, Lee W.
    Scoglio, Caterina M.
    IEEE ACCESS, 2021, 9 : 141210 - 141222
  • [37] A Fast Hyperspectral Feature Selection Method Based on Band Correlation Analysis
    Zhang, Wenqiang
    Li, Xiaorun
    Zhao, Liaoying
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2018, 15 (11) : 1750 - 1754
  • [38] Meteorological Feature Selection Method Based on Information Value and Maximum Correlation
    Zhang, Di
    Zhang, Yi
    Zhou, Junlin
    Yan, Pan
    Yang, Xin
    Fang, Yuke
    2018 CHINESE AUTOMATION CONGRESS (CAC), 2018, : 3159 - 3164
  • [39] Feature gene selection method based on logistic and correlation information entropy
    Xu, Jiucheng
    Li, Tao
    Sun, Lin
    BIO-MEDICAL MATERIALS AND ENGINEERING, 2015, 26 : S1953 - S1959
  • [40] A Parallel Hybrid Feature Selection Approach Based on Multi-Correlation and Evolutionary Multitasking
    Azaiz, Mohamed Amine
    Bensaber, Djamel Amar
    INTERNATIONAL JOURNAL OF GRID AND HIGH PERFORMANCE COMPUTING, 2023, 15 (01)