Prediction of severe COVID-19 outcomes at the time testing: An anomaly detection approach


Early and effective detection of severe infection cases during a pandemic can significantly help patient prognosis and resource allocation. We develop a machine learning framework for detecting severe COVID-19 cases at the time of RT-PCR testing. We retrospectively studied 988 patients from a small Canadian province that tested positive for SARS-CoV-2 where 42 (4%) cases were at-risk (i.e., resulted in hospitalization, admission to ICU, or death), and 8 (< 1%) cases resulted in death. The limited information available at the time of RT-PCR testing included age, comorbidities, and patients' reported symptoms, totaling 27 features. Vaccination status was unavailable. Due to the severe class imbalance and small dataset size, we formulated the problem of detecting severe COVID as anomaly detection and applied three models: one-class support vector machine (OCSVM), weight-adjusted XGBoost, and weight-adjusted Ad-aBoost. The OCSVM was the best performing model for detecting the deceased cases with an average 95% true positive rate (TPR) and 27.2% false positive rate (FPR). Meanwhile, the XGBoost provided the best performance for detecting the at-risk cases with an average 96.2% TPR and 19% FPR. In addition, we developed a novel extension to SHAP interpretability to explain the outputs from the models. In agreement with conventional knowledge, we found that comorbidities were influential in predicting severity, however, we also found that symptoms were generally more influential, noting that machine learning combines all available data and is not a single-variate statistical analysis.

Dionne M. Aleman, PhD, PEng
Dionne M. Aleman, PhD, PEng
Associate Professor of Industrial Engineering