An investigation into the impact of temporality on COVID-19 infection and mortality predictions: New perspective based on Shapley values

Mingming Chen; Qihang Qian; Xiang Pan; Tenglong Li

doi:10.1186/s12874-025-02572-8

An investigation into the impact of temporality on COVID-19 infection and mortality predictions: New perspective based on Shapley values

Mingming Chen, Qihang Qian, Xiang Pan, Tenglong Li^*

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

Abstract

Introduction: Machine learning models have been employed to predict COVID-19 infections and mortality, but many models were built on training and testing sets from different periods. The purpose of this study is to investigate the impact of temporality, i.e., the temporal gap between training and testing sets, on model performances for predicting COVID-19 infections and mortality. Furthermore, this study seeks to understand the causes of the impact of temporality. Methods: This study used a COVID-19 surveillance dataset collected from Brazil in year 2020, 2021 and 2022, and built prediction models for COVID-19 infections and mortality using random forest and logistic regression, with 20 model features. Models were trained and tested based on data from different years and the same year as well, to examine the impact of temporality. To further explain the impact of temporality and its driving factors, Shapley values are employed to quantify individual contributions to model predictions. Results: For the infection model, we found that the temporal gap had a negative impact on prediction accuracy. On average, the loss in accuracy was 0.0256 for logistic regression and 0.0436 for random forest when there was a temporal gap between the training and testing sets. For the mortality model, the loss in accuracy was 0.0144 for logistic regression and 0.0098 for random forest, which means the impact of temporality was not as strong as in the infection model. Shapley values uncovered the reason behind such differences between the infection and mortality models. Conclusions: Our study confirmed the negative impact of temporality on model performance for predicting COVID-19 infections, but it did not find such negative impact of temporality for predicting COVID-19 mortality. Shapley value revealed that there was a fixed set of four features that made predominant contributions for the mortality model across data in three years (2020–2022), while for the infection model there was no such fixed set of features across different years.

Original language	English
Article number	111
Journal	BMC Medical Research Methodology
Volume	25
DOIs	https://doi.org/10.1186/s12874-025-02572-8
Publication status	Published - 24 Apr 2025

Keywords

COVID-19 infection prediction
COVID-19 mortality prediction
Random forest
Shapley values
Temporality import

Access to Document

10.1186/s12874-025-02572-8Licence: CC BY-ND

Cite this

@article{862c3dcad6c74cdbbc9a6b07b0c6a6ee,

title = "An investigation into the impact of temporality on COVID-19 infection and mortality predictions: New perspective based on Shapley values",

abstract = "Introduction: Machine learning models have been employed to predict COVID-19 infections and mortality, but many models were built on training and testing sets from different periods. The purpose of this study is to investigate the impact of temporality, i.e., the temporal gap between training and testing sets, on model performances for predicting COVID-19 infections and mortality. Furthermore, this study seeks to understand the causes of the impact of temporality. Methods: This study used a COVID-19 surveillance dataset collected from Brazil in year 2020, 2021 and 2022, and built prediction models for COVID-19 infections and mortality using random forest and logistic regression, with 20 model features. Models were trained and tested based on data from different years and the same year as well, to examine the impact of temporality. To further explain the impact of temporality and its driving factors, Shapley values are employed to quantify individual contributions to model predictions. Results: For the infection model, we found that the temporal gap had a negative impact on prediction accuracy. On average, the loss in accuracy was 0.0256 for logistic regression and 0.0436 for random forest when there was a temporal gap between the training and testing sets. For the mortality model, the loss in accuracy was 0.0144 for logistic regression and 0.0098 for random forest, which means the impact of temporality was not as strong as in the infection model. Shapley values uncovered the reason behind such differences between the infection and mortality models. Conclusions: Our study confirmed the negative impact of temporality on model performance for predicting COVID-19 infections, but it did not find such negative impact of temporality for predicting COVID-19 mortality. Shapley value revealed that there was a fixed set of four features that made predominant contributions for the mortality model across data in three years (2020–2022), while for the infection model there was no such fixed set of features across different years.",

keywords = "COVID-19 infection prediction, COVID-19 mortality prediction, Random forest, Shapley values, Temporality import",

author = "Mingming Chen and Qihang Qian and Xiang Pan and Tenglong Li",

note = "Publisher Copyright: {\textcopyright} The Author(s) 2025.",

year = "2025",

month = apr,

day = "24",

doi = "10.1186/s12874-025-02572-8",

language = "English",

volume = "25",

journal = "BMC Medical Research Methodology",

}

TY - JOUR

T1 - An investigation into the impact of temporality on COVID-19 infection and mortality predictions: New perspective based on Shapley values

AU - Chen, Mingming

AU - Qian, Qihang

AU - Pan, Xiang

AU - Li, Tenglong

N1 - Publisher Copyright: © The Author(s) 2025.

PY - 2025/4/24

Y1 - 2025/4/24

N2 - Introduction: Machine learning models have been employed to predict COVID-19 infections and mortality, but many models were built on training and testing sets from different periods. The purpose of this study is to investigate the impact of temporality, i.e., the temporal gap between training and testing sets, on model performances for predicting COVID-19 infections and mortality. Furthermore, this study seeks to understand the causes of the impact of temporality. Methods: This study used a COVID-19 surveillance dataset collected from Brazil in year 2020, 2021 and 2022, and built prediction models for COVID-19 infections and mortality using random forest and logistic regression, with 20 model features. Models were trained and tested based on data from different years and the same year as well, to examine the impact of temporality. To further explain the impact of temporality and its driving factors, Shapley values are employed to quantify individual contributions to model predictions. Results: For the infection model, we found that the temporal gap had a negative impact on prediction accuracy. On average, the loss in accuracy was 0.0256 for logistic regression and 0.0436 for random forest when there was a temporal gap between the training and testing sets. For the mortality model, the loss in accuracy was 0.0144 for logistic regression and 0.0098 for random forest, which means the impact of temporality was not as strong as in the infection model. Shapley values uncovered the reason behind such differences between the infection and mortality models. Conclusions: Our study confirmed the negative impact of temporality on model performance for predicting COVID-19 infections, but it did not find such negative impact of temporality for predicting COVID-19 mortality. Shapley value revealed that there was a fixed set of four features that made predominant contributions for the mortality model across data in three years (2020–2022), while for the infection model there was no such fixed set of features across different years.

AB - Introduction: Machine learning models have been employed to predict COVID-19 infections and mortality, but many models were built on training and testing sets from different periods. The purpose of this study is to investigate the impact of temporality, i.e., the temporal gap between training and testing sets, on model performances for predicting COVID-19 infections and mortality. Furthermore, this study seeks to understand the causes of the impact of temporality. Methods: This study used a COVID-19 surveillance dataset collected from Brazil in year 2020, 2021 and 2022, and built prediction models for COVID-19 infections and mortality using random forest and logistic regression, with 20 model features. Models were trained and tested based on data from different years and the same year as well, to examine the impact of temporality. To further explain the impact of temporality and its driving factors, Shapley values are employed to quantify individual contributions to model predictions. Results: For the infection model, we found that the temporal gap had a negative impact on prediction accuracy. On average, the loss in accuracy was 0.0256 for logistic regression and 0.0436 for random forest when there was a temporal gap between the training and testing sets. For the mortality model, the loss in accuracy was 0.0144 for logistic regression and 0.0098 for random forest, which means the impact of temporality was not as strong as in the infection model. Shapley values uncovered the reason behind such differences between the infection and mortality models. Conclusions: Our study confirmed the negative impact of temporality on model performance for predicting COVID-19 infections, but it did not find such negative impact of temporality for predicting COVID-19 mortality. Shapley value revealed that there was a fixed set of four features that made predominant contributions for the mortality model across data in three years (2020–2022), while for the infection model there was no such fixed set of features across different years.

KW - COVID-19 infection prediction

KW - COVID-19 mortality prediction

KW - Random forest

KW - Shapley values

KW - Temporality import

UR - http://www.scopus.com/inward/record.url?scp=105003481607&partnerID=8YFLogxK

U2 - 10.1186/s12874-025-02572-8

DO - 10.1186/s12874-025-02572-8

M3 - Article

C2 - 40275181

VL - 25

JO - BMC Medical Research Methodology

JF - BMC Medical Research Methodology

M1 - 111

ER -

An investigation into the impact of temporality on COVID-19 infection and mortality predictions: New perspective based on Shapley values

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this