Objective To develop and validate risk prediction models utilizing five machine learning algorithms for assessing postoperative pulmonary infection (PPI) risk in lung cancer patients undergoing grade Ⅳ thoracoscopic surgery.
Methods A retrospective cohort study included 2, 380 lung cancer patients who underwent grade Ⅳ thoracoscopic surgery at a tertiary hospital in Shanghai (January 2022 to June 2024). Patients were stratified into training (n=1, 665) and validation (n=715) cohorts. Five machine learning algorithms—Logistic regression (LR), artificial neural network (ANN), support vector machine (SVM), random forest (RF), and extreme gradient boosting (XGB)—were employed to construct predictive models. A nomogram was developed for clinical utility.
Results Among 2, 380 patients, 226 (9.5%) developed PPI. The Least Absolute Shrinkage and Selection Operator (LASSO) regression identified eight predictive variables: daily cigarette consumption, diabetes history, preoperative diffusing capacity, maximal tumor diameter, 24-hour postoperative chest drainage volume, perioperative oral nutritional supplementation (ONS), postoperative urinary catheterization, and intraoperative pleural adhesion severity. All models demonstrated robust discrimination, with area under the curve (AUC) values ranging from 0.862 to 0.947. The XGB model achieved superior performance (AUC=0.947, 95%CI, 0.937 to 0.962), followed closely by the LR model (AUC=0.926, 95%CI, 0.918 to 0.933).
Conclusion Machine learning-based algorithms models effectively stratify PPI risk in lung cancer patients following grade Ⅳ thoracoscopic surgery. The derived nomogram provides a practical tool for perioperative risk management by healthcare providers.