Abstract:
Objective To construct machine learning (ML) model for predicting early neurological deterioration (END) after intravenous thrombolysis (IVT) in patients with acute ischemic stroke (AIS), and to analyze risk factors of END using Shapley additive explanations (SHAP).
Methods A total of 97 AIS patients who received IVT were enrolled. Patients were divided into END group (18 cases) and non-END group (79 cases) based on whether they experienced END within 24 hours post-IVT. All patients were randomly divided into training set (n=68) and validation set (n=29) at ratio of 7 to 3. Univariate and least absolute shrinkage and selection operator (LASSO) analyses were performed to screen important feature variables associated with END from clinical data. Six ML algorithms, including random forest, light gradient boosting machine, decision tree, support vector machine, k-nearest neighbors and extreme gradient boosting, were employed to construct predictive models. Receiver operating characteristic (ROC) curves, calibration curves and clinical decision curve analysis (DCA) were used to evaluate the performance of each ML model. The SHAP method was introduced to interpret the optimal ML model.
Results Among the six ML algorithm models, the random forest model was identified as best predictive model. In the training set, it achieved area under the curve (AUC) of 0.909, with specificity, precision, recall and F1 score being 0.873, 0.856, 0.910 and 0.825, respectively. In the validation set, its AUC was 0.915, with corresponding values of 0.824, 0.800, 0.945 and 0.834. Calibration curves and DCA demonstrated that the random forest model had higher prediction accuracy and clinical net benefit. SHAP variable importance plots revealed that the top six contributing imaging factors to END were large-area cerebral infarction, pre-thrombolysis National Institutes of Health Stroke Scale (NIHSS) score, door-to-needle time (DNT), history of atrial fibrillation, white blood cell (WBC) levels and history of diabetes.
Conclusion ML models can effectively predict the risk of END in IVT patients, with the random forest model demonstrating the best predictive performance. Combining SHAP for model visualization interpretation aids clinicians in understanding the contribution of each feature variable to the prediction results, thereby facilitating targeted preventive treatment strategies.