Türk Medline
ADR Yönetimi
ADR Yönetimi

EVALUATION OF SURGICAL RECOMMENDATION ALGORITHMS IN HALLUX VALGUS CASES USING CHATGPT MODELS: AN ARTIFICIAL INTELLIGENCE APPROACH BASED ON 50 SIMULATED SCENARIOS

Ömer Levent KARADAMAR, Ali AYDİLEK

Acta Orthopaedica et Traumatologica Turcica - 2026;60(2):1-8

Department of Orthopedics and Traumatology, Döşemealtı State Hospital, Antalya, Türkiye

 

Objective: The integration of large language models (LLMs) into orthopedic surgical decision-making represents a growing area of research. This study aimed to compare the surgical recommendation capabilities of ChatGPT-4.0, ChatGPT-4o, and ChatGPT-5 models in simulated cases of hallux valgus deformity. Methods: A total of 50 simulated cases were constructed using fundamental clinical data related to hallux valgus pathology and were individually submitted to 3 models. For each case, a surgical recommendation and a brief rationale were obtained. The responses were compared based on content alignment, adequacy of justification, and consistency with established surgical algorithms. Additionally, textual outputs were evaluated using DISCERN and multiple readability indices. Results: All 3 models demonstrated overall algorithmic consistency. However, the ChatGPT-5 model provided the most context-aware and clinically consistent recommendations, followed by ChatGPT-4o and ChatGPT-4.0. Concordance rates with surgical algorithms were 70% for ChatGPT-4.0, 82% for ChatGPT-4o, and 90% for ChatGPT-5. DISCERN scores were 51, 56, and 62, respectively. ChatGPT-5 also achieved superior performance across readability metrics, including Flesch Reading Ease Score, Simple Measure of Gobbledygook, and Gunning-Fog Index, indicating improved textual clarity and reduced complexity. Conclusion: The ChatGPT-5 model demonstrated the highest contextual accuracy, consistency, and readability in surgical decision support for hallux valgus surgery, highlighting the progressive improvements of newer LLMs. Nonetheless, ChatGPT-4.0 and ChatGPT-4o also exhibited compatibility with surgical algorithms, indicating potential utility in generating patient education and informational content.