TY - JOUR
T1 - A hybrid approach for modeling bicycle crash frequencies
T2 - Integrating random forest based SHAP model with random parameter negative binomial regression model
AU - Ding, Hongliang
AU - Wang, Ruiqi
AU - Chen, Tiantian
AU - Sze, N. N.
AU - Chung, Hyungchul
AU - Dong, Ni
N1 - Publisher Copyright:
© 2024
PY - 2024/12
Y1 - 2024/12
N2 - To effectively capture and explain complex, nonlinear relationships within bicycle crash frequency data and account for unobserved heterogeneity simultaneously, this study proposes a new hybrid framework that combines the Random Forest-based SHapley Additive exPlanations (RF-SHAP) method with a random parameter negative binomial regression model (RPNB). First, four machine learning algorithms, including random forest (RF), support vector machine (SVM), gradient boosting machine (GBM), and Extreme Gradient Boosting (XGBoost), were compared for variable importance calculation. The RF algorithm, demonstrating the best performance, was selected and integrated into an interpretable machine learning-based method (i.e., RF-SHAP) to provide an interpretable measure of each variable's impact, which is critical for understanding the model's predictions results. Finally, the RF-SHAP method was combined with the RPNB model to explore individual-specific variations that influence crash frequency predictions. Using 288 traffic analysis zones (TAZs) in Greater London and various regional risk factors for bicycle crash frequency, the proposed framework was validated. The results indicate that the proposed framework demonstrates improved prediction accuracy and better factor interpretation in analyzing bicycle crash frequency. The model exhibits consistent Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) values, indicating its reliable explanatory power. Furthermore, there is a significant improvement in the Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE). This suggests that the proposed model effectively combines the explanatory power of statistical models with the forecasting powers of data-driven models. The interpretability of SHAP values, coupled with the causal insights from RPNB, provides policymakers with actionable information to develop targeted interventions.
AB - To effectively capture and explain complex, nonlinear relationships within bicycle crash frequency data and account for unobserved heterogeneity simultaneously, this study proposes a new hybrid framework that combines the Random Forest-based SHapley Additive exPlanations (RF-SHAP) method with a random parameter negative binomial regression model (RPNB). First, four machine learning algorithms, including random forest (RF), support vector machine (SVM), gradient boosting machine (GBM), and Extreme Gradient Boosting (XGBoost), were compared for variable importance calculation. The RF algorithm, demonstrating the best performance, was selected and integrated into an interpretable machine learning-based method (i.e., RF-SHAP) to provide an interpretable measure of each variable's impact, which is critical for understanding the model's predictions results. Finally, the RF-SHAP method was combined with the RPNB model to explore individual-specific variations that influence crash frequency predictions. Using 288 traffic analysis zones (TAZs) in Greater London and various regional risk factors for bicycle crash frequency, the proposed framework was validated. The results indicate that the proposed framework demonstrates improved prediction accuracy and better factor interpretation in analyzing bicycle crash frequency. The model exhibits consistent Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) values, indicating its reliable explanatory power. Furthermore, there is a significant improvement in the Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE). This suggests that the proposed model effectively combines the explanatory power of statistical models with the forecasting powers of data-driven models. The interpretability of SHAP values, coupled with the causal insights from RPNB, provides policymakers with actionable information to develop targeted interventions.
KW - Bicycle frequency
KW - Hybrid approach
KW - Random Forest based SHAP
KW - Random parameter negative binomial regression model
KW - Regional factors
KW - Unobserved effects
UR - http://www.scopus.com/inward/record.url?scp=85203659380&partnerID=8YFLogxK
U2 - 10.1016/j.aap.2024.107778
DO - 10.1016/j.aap.2024.107778
M3 - Article
AN - SCOPUS:85203659380
SN - 0001-4575
VL - 208
JO - Accident Analysis and Prevention
JF - Accident Analysis and Prevention
M1 - 107778
ER -