DEVELOPMENT OF A MACHINE LEARNING MODEL FOR PREDICTING MORTALITY RISK IN THORACIC SURGERY PATIENTS: A SYNTHETIC DATA STUDY

Okan KARATAS, Nilay Cavusoglu YALCIN, Muharrem OZKAYA, Gungor Efe OZKAYA

Current Thoracic Surgery - 2026;11(1):7-13

Department of Thoracic Surgery, University of Health Sciences, Antalya Training and Research Hospital, Antalya, Türkiye

 

Background: Accurate prediction of postoperative mortality is crucial in thoracic surgery to guide clinical decisions and optimize patient care. The objective of this study was to demonstrate how to develop a reproducible machine learning-based model for predicting 30-day postoperative mortality in thoracic surgery patients, using a realistic synthetic dataset and clinically relevant perioperative variables. Materials and Methods: We developed a machine learning (ML) model to estimate 30-day mortality risk in thoracic (lung) surgery patients, using realistic synthetic data of 1000 patients to illustrate the approach. Risk factors (age, sex, comorbidities, pulmonary function, etc.) and surgery type (wedge/ segmentectomy, lobectomy, pneumonectomy) were incorporated. We tested logistic regression, random forest, and XGBoost algorithms. Results: Logistic regression yielded the best performance (AUC=0.82548) on our synthetic cohort of 1000 patients. The model also demonstrated good calibration (Brier score=0.040), indicating accurate probability estimation. Key predictors included advanced age, high ASA class, reduced FEV1, presence of COPD, and undergoing pneumonectomy consistent with known clinical factors. Model steps are detailed below. An appendix provides Python code for data generation and model training. Conclusion: This work demonstrates a reproducible pipeline for risk prediction in thoracic surgery using a synthetic dataset, aligning with modern approaches like the STS risk calculators and advanced ML models (e.g. Predicthor). This model is intended for methodological demonstration only and is not clinically validated for real-world use.