TY - JOUR
T1 - Selection of contributing factors for predicting landslide susceptibility using machine learning and deep learning models
AU - Chen, Cheng
AU - Fan, Lei
N1 - Funding Information:
This research was funded by Xi’an Jiaotong-Liverpool University key Program Special Fund under grant number KSF-E-40 and Research Enhancement Fund under grant number REF-21-01-003.
Publisher Copyright:
© 2023, The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.
PY - 2023/9/13
Y1 - 2023/9/13
N2 - Landslides are a common natural disaster that can cause casualties, property safety threats and economic losses. Therefore, it is important to understand or predict the probability of landslide occurrence at potentially risky sites. A commonly used means is to carry out a landslide susceptibility assessment based on a landslide inventory and a set of landslide contributing factors. This can be readily achieved using machine learning (ML) models such as logistic regression (LR), support vector machine (SVM), random forest (RF), extreme gradient boosting (Xgboost), or deep learning (DL) models such as convolutional neural network (CNN) and long short time memory (LSTM). As the input data for these models, landslide contributing factors have varying influences on landslide occurrence. Therefore, it is logically feasible to select more important contributing factors and eliminate less relevant ones, with the aim of increasing the prediction accuracy of these models. However, selecting more important factors is still a challenging task and there is no generally accepted method. Furthermore, the effects of factor selection using various methods on the prediction accuracy of ML and DL models are unclear. In this study, the impact of the selection of contributing factors on the accuracy of landslide susceptibility predictions using ML and DL models was investigated. Four methods for selecting contributing factors were considered for all the aforementioned ML and DL models, which included Information Gain Ratio (IGR), Recursive Feature Elimination (RFE), Particle Swarm Optimization (PSO), Least Absolute Shrinkage and Selection Operators (LASSO) and Harris Hawk Optimization (HHO). In addition, autoencoder-based factor selection methods for DL models were also investigated. To assess their performances, an exhaustive approach was adopted, testing all possible selection cases of contributing factors, the results of which served as the benchmark. The results confirmed that using more important contributing factors improved the prediction accuracy of the ML models considered. However, it was interesting to find that the selection of contributing factors using IGR and RFE reduced the predictive accuracy of the DL models. For the DL models, using an autoencoder architecture improved their prediction performance. The study also found that the choice of factor selection methods was far more effective than the selection of contributing factors in improving the prediction accuracy of landslide susceptibility.
AB - Landslides are a common natural disaster that can cause casualties, property safety threats and economic losses. Therefore, it is important to understand or predict the probability of landslide occurrence at potentially risky sites. A commonly used means is to carry out a landslide susceptibility assessment based on a landslide inventory and a set of landslide contributing factors. This can be readily achieved using machine learning (ML) models such as logistic regression (LR), support vector machine (SVM), random forest (RF), extreme gradient boosting (Xgboost), or deep learning (DL) models such as convolutional neural network (CNN) and long short time memory (LSTM). As the input data for these models, landslide contributing factors have varying influences on landslide occurrence. Therefore, it is logically feasible to select more important contributing factors and eliminate less relevant ones, with the aim of increasing the prediction accuracy of these models. However, selecting more important factors is still a challenging task and there is no generally accepted method. Furthermore, the effects of factor selection using various methods on the prediction accuracy of ML and DL models are unclear. In this study, the impact of the selection of contributing factors on the accuracy of landslide susceptibility predictions using ML and DL models was investigated. Four methods for selecting contributing factors were considered for all the aforementioned ML and DL models, which included Information Gain Ratio (IGR), Recursive Feature Elimination (RFE), Particle Swarm Optimization (PSO), Least Absolute Shrinkage and Selection Operators (LASSO) and Harris Hawk Optimization (HHO). In addition, autoencoder-based factor selection methods for DL models were also investigated. To assess their performances, an exhaustive approach was adopted, testing all possible selection cases of contributing factors, the results of which served as the benchmark. The results confirmed that using more important contributing factors improved the prediction accuracy of the ML models considered. However, it was interesting to find that the selection of contributing factors using IGR and RFE reduced the predictive accuracy of the DL models. For the DL models, using an autoencoder architecture improved their prediction performance. The study also found that the choice of factor selection methods was far more effective than the selection of contributing factors in improving the prediction accuracy of landslide susceptibility.
KW - Contributing factors
KW - Deep learning
KW - Factor selection
KW - Landslide
KW - Landslide susceptibility
KW - Machine learning
KW - Prediction
UR - http://www.scopus.com/inward/record.url?scp=85171194474&partnerID=8YFLogxK
U2 - 10.1007/s00477-023-02556-4
DO - 10.1007/s00477-023-02556-4
M3 - Article
AN - SCOPUS:85171194474
SN - 1436-3240
JO - Stochastic Environmental Research and Risk Assessment
JF - Stochastic Environmental Research and Risk Assessment
ER -