An investigation into the impact of temporality on COVID-19 infection and mortality predictions: New perspective based on Shapley values

Mingming Chen, Qihang Qian, Xiang Pan, Tenglong Li*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Introduction: Machine learning models have been employed to predict COVID-19 infections and mortality, but many models were built on training and testing sets from different periods. The purpose of this study is to investigate the impact of temporality, i.e., the temporal gap between training and testing sets, on model performances for predicting COVID-19 infections and mortality. Furthermore, this study seeks to understand the causes of the impact of temporality. Methods: This study used a COVID-19 surveillance dataset collected from Brazil in year 2020, 2021 and 2022, and built prediction models for COVID-19 infections and mortality using random forest and logistic regression, with 20 model features. Models were trained and tested based on data from different years and the same year as well, to examine the impact of temporality. To further explain the impact of temporality and its driving factors, Shapley values are employed to quantify individual contributions to model predictions. Results: For the infection model, we found that the temporal gap had a negative impact on prediction accuracy. On average, the loss in accuracy was 0.0256 for logistic regression and 0.0436 for random forest when there was a temporal gap between the training and testing sets. For the mortality model, the loss in accuracy was 0.0144 for logistic regression and 0.0098 for random forest, which means the impact of temporality was not as strong as in the infection model. Shapley values uncovered the reason behind such differences between the infection and mortality models. Conclusions: Our study confirmed the negative impact of temporality on model performance for predicting COVID-19 infections, but it did not find such negative impact of temporality for predicting COVID-19 mortality. Shapley value revealed that there was a fixed set of four features that made predominant contributions for the mortality model across data in three years (2020–2022), while for the infection model there was no such fixed set of features across different years.

Original languageEnglish
Article number111
JournalBMC Medical Research Methodology
Volume25
DOIs
Publication statusPublished - 24 Apr 2025

Keywords

  • COVID-19 infection prediction
  • COVID-19 mortality prediction
  • Random forest
  • Shapley values
  • Temporality import

Fingerprint

Dive into the research topics of 'An investigation into the impact of temporality on COVID-19 infection and mortality predictions: New perspective based on Shapley values'. Together they form a unique fingerprint.

Cite this