基于可解释机器学习构建急性缺血性脑卒中静脉溶栓预后模型

李娟; 祁冬; 庄雷; 司峥

doi:10.7619/jcmp.20245918

基于可解释机器学习构建急性缺血性脑卒中静脉溶栓预后模型

Construction of prognostic model for intravenous thrombolysis in acute ischemic stroke based on interpretable machine learning

摘要

摘要:
目的构建急性缺血性脑卒中(AIS)患者静脉溶栓(IVT)后发生早期神经功能恶化(END)机器学习(ML)模型，并利用沙普利加和解释(SHAP)分析发生END的风险因素。
方法选取97例接受IVT治疗的AIS患者作为研究对象。根据患者IVT后24 h内是否出现END分为END组(18例)和非END组(79例)。将所有患者按照7: 3随机分为训练集(n=68)和验证集(n=29)。采用单因素及最小绝对收缩与选择算子(LASSO)分析患者的临床资料，筛选发生END的重要特征变量。分别应用随机森林、轻量级梯度提升机、决策树、支持向量机、K最近邻和极端梯度提升这6种ML算法构建预测模型。采用受试者工作特征(ROC)曲线、校准曲线及决策曲线分析(DCA)对各ML模型进行效能评估，并引入SHAP方法对最优ML模型进行解释。
结果 6种ML算法模型比较结果显示，随机森林为最佳预测模型，在训练集中的曲线下面积(AUC)为0.909, 特异度、精确率、召回率及F1值分别为0.873、0.856、0.910及0.825; 在验证集中的AUC为0.915, 特异度、精确率、召回率及F1值分别为0.824、0.800、0.945和0.834。校准曲线和DCA曲线显示，随机森林模型具有更高的预测精准度和临床净获益率。SHAP变量重要性图显示对发生END贡献程度最大的前6位影像因素依次为大面积脑梗死、溶栓前美国国立卫生研究院卒中量表(NIHSS)评分、到院至溶栓时间(DNT)、心房颤动史、白细胞(WBC)水平及糖尿病史。
结论 ML模型能够有效预测IVT患者发生END的风险，其中随机森林模型预测效能最佳，结合SHAP进行模型可视化解释，能够帮助临床医师了解各特征变量对预测结果的贡献程度，从而实施针对性预防治疗方案。

Abstract:
Objective To construct machine learning (ML) model for predicting early neurological deterioration (END) after intravenous thrombolysis (IVT) in patients with acute ischemic stroke (AIS), and to analyze risk factors of END using Shapley additive explanations (SHAP).
Methods A total of 97 AIS patients who received IVT were enrolled. Patients were divided into END group (18 cases) and non-END group (79 cases) based on whether they experienced END within 24 hours post-IVT. All patients were randomly divided into training set (n=68) and validation set (n=29) at ratio of 7 to 3. Univariate and least absolute shrinkage and selection operator (LASSO) analyses were performed to screen important feature variables associated with END from clinical data. Six ML algorithms, including random forest, light gradient boosting machine, decision tree, support vector machine, k-nearest neighbors and extreme gradient boosting, were employed to construct predictive models. Receiver operating characteristic (ROC) curves, calibration curves and clinical decision curve analysis (DCA) were used to evaluate the performance of each ML model. The SHAP method was introduced to interpret the optimal ML model.
Results Among the six ML algorithm models, the random forest model was identified as best predictive model. In the training set, it achieved area under the curve (AUC) of 0.909, with specificity, precision, recall and F1 score being 0.873, 0.856, 0.910 and 0.825, respectively. In the validation set, its AUC was 0.915, with corresponding values of 0.824, 0.800, 0.945 and 0.834. Calibration curves and DCA demonstrated that the random forest model had higher prediction accuracy and clinical net benefit. SHAP variable importance plots revealed that the top six contributing imaging factors to END were large-area cerebral infarction, pre-thrombolysis National Institutes of Health Stroke Scale (NIHSS) score, door-to-needle time (DNT), history of atrial fibrillation, white blood cell (WBC) levels and history of diabetes.
Conclusion ML models can effectively predict the risk of END in IVT patients, with the random forest model demonstrating the best predictive performance. Combining SHAP for model visualization interpretation aids clinicians in understanding the contribution of each feature variable to the prediction results, thereby facilitating targeted preventive treatment strategies.

HTML全文

参考文献(20)

施引文献

资源附件(0)