Handling data imbalance in machine learning based landslide susceptibility mapping: a case study of Mandakini River Basin, North-Western Himalayas

被引:0
|
作者
Sharad Kumar Gupta
Dericks P. Shukla
机构
[1] Indian Institute of Technology Mandi,School of Civil and Environmental Engineering
[2] Himachal Pradesh,Porter School of Environment and Earth Sciences
[3] Tel Aviv University,undefined
来源
Landslides | 2023年 / 20卷
关键词
Machine learning; Landslide susceptibility mapping; Imbalanced learning; Undersampling; Support vector machine; Artificial neural network;
D O I
暂无
中图分类号
学科分类号
摘要
Machine learning methods require a vast amount of data to train a model. The data necessary for landslide susceptibility mapping is a collection of landslide causative factors as predictors and landslide inventory as a response variable; however, landslides do not occur everywhere, and the occurrence of landslides is limited in an area. This geophysical phenomenon leads to severely skewed class distribution, wherein the number of landslide samples (minority class) is significantly less than non-landslide locations (majority class). The imbalance in landslide data hampers the predictive ability of learning algorithms, and hence, the final models show poor performance in the class with fewer samples. This work uses two undersampling techniques, namely, EasyEnsemble (EE) and BalanceCascade (BC), for reducing the effect of imbalance in data. The landslides that occurred between 2004 and 2013 are randomly divided into two groups, i.e., 70% of the samples for training and 30% for testing, whereas the landslides that occurred between 2014 and 2017 have been used for validation. The balanced data is used with the support vector machine (SVM) and artificial neural network (ANN), thereby making four new approaches, i.e., EESVM, EEANN, BCSVM, and BCANN, for susceptibility mapping. We used several metrics, such as recall, geometric mean, precision, accuracy, and Heidke skill score, to evaluate the performance of landslide susceptibility maps. The AUC for imbalanced data with SVM and ANN is 0.50, which shows that the model cannot discriminate between landslide and non-landslide locations. This misclassification is due to a small number of landslide samples and serious class biases. The balanced data using EE and BC methods gives promising results and shows significant improvements, wherein the AUC of EESVM, EEANN, BCSVM, and BCANN is 0.869, 0.918, 0.881, and 0.923, respectively. Among all the methods, the recall and G-mean values were highest for EEANN, which represents the best separation performance of EEANN on landslide samples. Furthermore, we have used the standard error (SE) of AUC and 95% confidence interval to test the significance of various combinations of classification and undersampling schemes. The SE is highest for EESVM and BCSVM among all methods. Based on several accuracy metrics, we conclude that EEANN performs better than all the other methods. The BC-based method does not perform well for landslide susceptibility mapping and provides the highest misclassification of landslide samples. The study shows that the susceptibility maps prepared over balanced data using SVM and ANN show remarkable improvements in accuracy over imbalanced data.
引用
收藏
页码:933 / 949
页数:16
相关论文
共 50 条
  • [21] Landslide Susceptibility Mapping Using Machine Learning Methods: A Case Study in Colorado Front Range, USA
    Pei, Te
    Qiu, Tong
    GEO-CONGRESS 2023: GEOTECHNICS OF NATURAL HAZARDS, 2023, 338 : 521 - 530
  • [22] A Data-Driven Approach to Landslide-Susceptibility Mapping in Mountainous Terrain: Case Study from the Northwest Himalayas, Pakistan
    Riaz, Muhammad Tayyib
    Basharat, Muhammad
    Hameed, Nasir
    Shafique, Muhammad
    Luo, Jin
    NATURAL HAZARDS REVIEW, 2018, 19 (04)
  • [23] Hyperparameter Tuning on Machine Learning-Based Landslide Susceptibility Mapping (Case study: Palu City and Its Surrounding areas)
    Sukristiyanti, Sukristiyanti
    Pamela, Pamela
    Putra, Moch Hilmi Zaenal
    Arinanti, Yukni
    Rozie, Andri Fachrur
    Lestiana, Hilda
    Susantoro, Tri Muji
    Sumaryono
    Kristiawan, Yohandi
    Putra, Iqbal Eras
    INDONESIAN JOURNAL OF GEOSCIENCE, 2025, 12 (01): : 43 - 53
  • [24] Soil erosion susceptibility mapping using ensemble machine learning models: A case study of upper Congo river sub-basin
    Kulimushi, Luc Cimusa
    Bashagaluke, Janvier Bigabwa
    Prasad, Pankaj
    Heri-Kazi, Aim B. Heri-Kazi
    Kushwaha, Nand Lal
    Masroor, Md
    Choudhari, Pandurang
    Elbeltagi, Ahmed
    Sajjad, Haroon
    Mohammed, Safwan
    CATENA, 2023, 222
  • [25] Landslide Susceptibility Prediction Using Machine Learning Methods: A Case Study of Landslides in the Yinghu Lake Basin in Shaanxi
    Ma, Sheng
    Chen, Jian
    Wu, Saier
    Li, Yurou
    SUSTAINABILITY, 2023, 15 (22)
  • [26] Environmental modelling of visceral leishmaniasis by susceptibility-mapping using neural networks: a case study in north-western Iran
    Rajabi, Mohammadreza
    Mansourian, Ali
    Pilesjo, Petter
    Bazmani, Ahad
    GEOSPATIAL HEALTH, 2014, 9 (01) : 179 - 191
  • [27] Handling the data imbalance and poor predictive ability of machine- and deep-learning-based mineral potential mapping
    Parsa, Mohammad
    Lentz, David R.
    16TH SGA BIENNIAL MEETING, 2022, VOL 1, 2022, : 298 - 301
  • [28] LANDSLIDE SUSCEPTIBILITY MAPPING USING MACHINE LEARNING ALGORITHMS STUDY CASE AL HOCEIMA REGION, NORTHERN MOROCCO
    Himmy, Oussama
    Rhinane, Hassan
    GEOINFORMATION WEEK 2022, VOL. 48-4, 2023, : 153 - 158
  • [29] Machine learning for high-resolution landslide susceptibility mapping: case study in Inje County, South Korea
    Le, Xuan-Hien
    Eu, Song
    Choi, Chanul
    Nguyen, Duc Hai
    Yeon, Minho
    Lee, Giha
    FRONTIERS IN EARTH SCIENCE, 2023, 11
  • [30] Landslide Susceptibility Mapping Using Machine Learning Algorithm: A Case Study Along Karakoram Highway (KKH), Pakistan
    Hussain, Muhammad Afaq
    Chen, Zhanlong
    Kalsoom, Isma
    Asghar, Aamir
    Shoaib, Muhammad
    JOURNAL OF THE INDIAN SOCIETY OF REMOTE SENSING, 2022, 50 (05) : 849 - 866