Handling data imbalance in machine learning based landslide susceptibility mapping: a case study of Mandakini River Basin, North-Western Himalayas

被引:0
|
作者
Sharad Kumar Gupta
Dericks P. Shukla
机构
[1] Indian Institute of Technology Mandi,School of Civil and Environmental Engineering
[2] Himachal Pradesh,Porter School of Environment and Earth Sciences
[3] Tel Aviv University,undefined
来源
Landslides | 2023年 / 20卷
关键词
Machine learning; Landslide susceptibility mapping; Imbalanced learning; Undersampling; Support vector machine; Artificial neural network;
D O I
暂无
中图分类号
学科分类号
摘要
Machine learning methods require a vast amount of data to train a model. The data necessary for landslide susceptibility mapping is a collection of landslide causative factors as predictors and landslide inventory as a response variable; however, landslides do not occur everywhere, and the occurrence of landslides is limited in an area. This geophysical phenomenon leads to severely skewed class distribution, wherein the number of landslide samples (minority class) is significantly less than non-landslide locations (majority class). The imbalance in landslide data hampers the predictive ability of learning algorithms, and hence, the final models show poor performance in the class with fewer samples. This work uses two undersampling techniques, namely, EasyEnsemble (EE) and BalanceCascade (BC), for reducing the effect of imbalance in data. The landslides that occurred between 2004 and 2013 are randomly divided into two groups, i.e., 70% of the samples for training and 30% for testing, whereas the landslides that occurred between 2014 and 2017 have been used for validation. The balanced data is used with the support vector machine (SVM) and artificial neural network (ANN), thereby making four new approaches, i.e., EESVM, EEANN, BCSVM, and BCANN, for susceptibility mapping. We used several metrics, such as recall, geometric mean, precision, accuracy, and Heidke skill score, to evaluate the performance of landslide susceptibility maps. The AUC for imbalanced data with SVM and ANN is 0.50, which shows that the model cannot discriminate between landslide and non-landslide locations. This misclassification is due to a small number of landslide samples and serious class biases. The balanced data using EE and BC methods gives promising results and shows significant improvements, wherein the AUC of EESVM, EEANN, BCSVM, and BCANN is 0.869, 0.918, 0.881, and 0.923, respectively. Among all the methods, the recall and G-mean values were highest for EEANN, which represents the best separation performance of EEANN on landslide samples. Furthermore, we have used the standard error (SE) of AUC and 95% confidence interval to test the significance of various combinations of classification and undersampling schemes. The SE is highest for EESVM and BCSVM among all methods. Based on several accuracy metrics, we conclude that EEANN performs better than all the other methods. The BC-based method does not perform well for landslide susceptibility mapping and provides the highest misclassification of landslide samples. The study shows that the susceptibility maps prepared over balanced data using SVM and ANN show remarkable improvements in accuracy over imbalanced data.
引用
收藏
页码:933 / 949
页数:16
相关论文
共 50 条
  • [41] Machine Learning Solution for Landslide Susceptibility Based on Hydrographic Division: Case Study of Fengjie County in Chongqing
    Zhang W.
    He Y.
    Wang L.
    Liu S.
    Chen B.
    Diqiu Kexue - Zhongguo Dizhi Daxue Xuebao/Earth Science - Journal of China University of Geosciences, 2023, 48 (05): : 2024 - 2038
  • [42] GIS-Based Landslide Susceptibility Mapping Using Logistic Regression, Instability Index, and Support Vector Machine: Case Study of the Jingshan River, Taiwan
    Chan, Hsun-Chuan
    Chen, Yu-Chin
    Lee, Jung-Tai
    Wen, Yu-Ting
    JOURNAL OF MARINE SCIENCE AND TECHNOLOGY-TAIWAN, 2021, 29 (03): : 287 - 299
  • [43] A Comparative Study of Shallow Machine Learning Models and Deep Learning Models for Landslide Susceptibility Assessment Based on Imbalanced Data
    Xu, Shiluo
    Song, Yingxu
    Hao, Xiulan
    FORESTS, 2022, 13 (11):
  • [44] Identification of inventory-based susceptibility models for assessing landslide probability: a case study of the Gaoping River Basin, Taiwan
    Harrison, John F.
    Chang, Chih-Hua
    Liu, Cheng-Chien
    GEOMATICS NATURAL HAZARDS & RISK, 2017, 8 (02) : 1730 - 1751
  • [45] Mapping Landslide Sensitivity Based on Machine Learning: A Case Study in Ankang City, Shaanxi Province, China
    Zhao, Baoxin
    Zhu, Jingzhong
    Hu, Youbiao
    Liu, Qimeng
    Liu, Yu
    GEOFLUIDS, 2022, 2022
  • [46] An integration of geospatial and machine learning techniques for mapping groundwater potential: a case study of the Shipra river basin, India
    Patidar R.
    Pingale S.M.
    Khare D.
    Arabian Journal of Geosciences, 2021, 14 (16)
  • [47] Study of the Flood Frequency Based on Normal Transformation in Arid Inland Region: A Case Study of Manas River in North-Western China
    Qiao, Changlu
    Cai, Guotao
    Liu, Yanxue
    Li, Junfeng
    Chen, Fulong
    MOBILE INFORMATION SYSTEMS, 2022, 2022
  • [48] Landslide susceptibility assessment using statistical and machine learning techniques: A case study in the upper reaches of the Minjiang River, southwestern China
    Ling, Sixiang
    Zhao, Siyuan
    Huang, Junpeng
    Zhang, Xuantu
    FRONTIERS IN EARTH SCIENCE, 2022, 10
  • [49] Research on Rain Pattern Classification Based on Machine Learning: A Case Study in Pi River Basin
    Fu, Xiaodi
    Kan, Guangyuan
    Liu, Ronghua
    Liang, Ke
    He, Xiaoyan
    Ding, Liuqian
    WATER, 2023, 15 (08)
  • [50] Comparative Analysis of Machine Learning, Statistical, and MCDA Methods for Rainfall-Induced Landslide Susceptibility Mapping in the Eco-Sensitive Koyna River Basin of India
    Patil, Abhijit S.
    Teli, Shobha S.
    Drakshe, Prathmesh P.
    Patil, Pavan A.
    Kadam, Arati D.
    Powar, Gouri P.
    Panhalkar, Sachin S.
    INDIAN GEOTECHNICAL JOURNAL, 2024, 55 (2) : 901 - 926