A parallel feature selection method based on NMI-XGBoost and distance correlation for typhoon trajectory prediction

被引:3
|
作者
Qiao, Baiyou [1 ]
Wu, Jiaqi [1 ]
Wang, Rui [2 ]
Hao, Yuanqing [1 ]
Wang, Peirui [1 ]
Han, Donghong [1 ]
Wu, Gang [1 ]
机构
[1] Northeastern Univ, Sch Comp Sci & Engn, Shenyang 110819, Peoples R China
[2] Chinese Acad Sci, Shenyang Inst Automat, Shenyang 110169, Peoples R China
来源
JOURNAL OF SUPERCOMPUTING | 2024年 / 80卷 / 08期
基金
中国国家自然科学基金;
关键词
Feature selection; NMI; XGBoost; Distance correlation; Spark; ASSOCIATION; DEPENDENCE; MODEL;
D O I
10.1007/s11227-023-05863-3
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Typhoon trajectory related data involve many factors, such as atmospheric factors, oceanic factors, and physical factors. It has the characteristics of high dimension, strong spatio-temporal correlation, and nonlinear correlation, which increases the difficulty of typhoon trajectory prediction. Using feature selection approaches to select appropriate prediction factors becomes an important means to reduce the dimension of typhoon trajectory related data and improve the performance and accuracy of typhoon trajectory prediction methods. However, the existing feature selection methods based on linear correlation analysis cannot well depict the nonlinear correlation between data features, which results in low accuracy of feature selection. The feature selection methods based on nonlinear correlation analysis are computationally expensive, which affects the timeliness of feature selection. To solve the problem, we propose a parallel feature selection method NX-Spark-DC based on the Spark platform for typhoon trajectory related data. The method firstly filters out the redundant features of typhoon related data by normalized mutual information (NMI) method, subsequently eliminates the useless features by XGBoost machine learning model, and thus reducing the dimension of typhoon related data. On this basis, an improved Spark-based parallel distance correlation algorithm (Spark-DC) is proposed to select the feature combinations with strong correlation. A series of experimental results show that NX-Spark-DC method has high execution efficiency and accuracy, which is significantly better than the existing methods.
引用
收藏
页码:11293 / 11321
页数:29
相关论文
共 50 条
  • [21] Feature selection method based on parallel immune cloning algorithm
    Chengdu Institute of Computer Application, Chinese Academy of Sciences, Chengdu 610041, China
    Shanghai Jiaotong Daxue Xuebao, 2009, 12 (1847-1851):
  • [22] A Clustering Based Feature Selection Method Using Feature Information Distance for Text Data
    Chao, Shilong
    Cai, Jie
    Yang, Sheng
    Wang, Shulin
    INTELLIGENT COMPUTING THEORIES AND APPLICATION, ICIC 2016, PT I, 2016, 9771 : 122 - 132
  • [23] Impact of Correlation-based Feature Selection on Photovoltaic Power Prediction
    Kwon, Jung-Hyok
    Lee, Sang-Woo
    Lee, Sol-Bee
    Kim, Eui-Jik
    2019 4TH TECHNOLOGY INNOVATION MANAGEMENT AND ENGINEERING SCIENCE INTERNATIONAL CONFERENCE (TIMES-ICON), 2019,
  • [24] Feature selection method based on category correlation and discernible sets
    Sun, Tong
    Qian, Shenyi
    Zhu, Haodong
    Qian, Shenyi, 1600, Binary Information Press (10): : 9687 - 9698
  • [25] A windowed correlation based feature selection method to improve time series prediction of dengue fever cases
    Ferdousi, Tanvir
    Cohnstaedt, Lee W.
    Scoglio, Caterina M.
    arXiv, 2021,
  • [26] Sigmis: A Feature Selection Algorithm Using Correlation Based Method
    Blessie, E. Chandra
    Karthikeyan, E.
    JOURNAL OF ALGORITHMS & COMPUTATIONAL TECHNOLOGY, 2012, 6 (03) : 385 - 394
  • [27] Feature Selection Method Based on Differential Correlation Information Entropy
    Wang, Xiujuan
    Yan, Yixuan
    Ma, Xiaoyue
    NEURAL PROCESSING LETTERS, 2020, 52 (02) : 1339 - 1358
  • [28] Feature Selection Method Based on Differential Correlation Information Entropy
    Xiujuan Wang
    Yixuan Yan
    Xiaoyue Ma
    Neural Processing Letters, 2020, 52 : 1339 - 1358
  • [29] HMMPred: Accurate Prediction of DNA-Binding Proteins Based on HMM Profiles and XGBoost Feature Selection
    Sang, Xiuzhi
    Xiao, Wanyue
    Zheng, Huiwen
    Yang, Yang
    Liu, Taigang
    COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE, 2020, 2020 (2020)
  • [30] Feature selection based on bhattacharyya distance: A generalized rough set method
    Sun, Liang
    Han, Chong-Zhao
    Dai, Ning
    Shen, Jian-Jing
    WCICA 2006: SIXTH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-12, CONFERENCE PROCEEDINGS, 2006, : 644 - 644