AI IN PATIENT CARE: EVALUATING LARGE LANGUAGE MODEL PERFORMANCE AGAINST EVIDENCE-BASED GUIDELINES FOR PULMONARY EMBOLISM

Ömer F. Karakoyun, Halil E. Koyuncuoğlu, Ömer H. Sağnıç, Mehmed E. Özdemir, Yalçın Gölcük, Birdal Yıldırım

Thoracic Research and Practice - 2026;27(1):38-46

Clinic of Emergency Medicine, Muğla Training and Research Hospital, Muğla, Türkiye

OBJECTIVE: Artificial intelligence (AI)-driven large language models (LLMs) are increasingly used in patient education; however, their ability to interpret and apply clinical guidelines within real-world physician workflows remains uncertain. Pulmonary embolism (PE), with its well-established diagnostic and management protocols, provides a suitable model for evaluating these systems. This study assessed the performance of four widely used AI-driven LLMs-ChatGPT-4o, DeepSeek-V2, Gemini, and Grok-in applying the 2019 European Society of Cardiology guidelines for PE. The focus was on evaluating clinical accuracy, adherence to guidelines, and response consistency. MATERIAL AND METHODS: Ten open-ended questions based on a simulated PE case were created, covering diagnosis, risk stratification, treatment, and follow-up. Guideline-based reference answers were used for scoring. LLMs were tested under identical conditions, and the responses were anonymized and scored by two emergency physicians using a 10-point scale. Inter-rater reliability was measured using the intraclass correlation coefficient (ICC), and group comparisons were made using Kruskal-Wallis tests. RESULTS: ChatGPT-4o scored highest (76), followed by Gemini (73.75), Grok (71.25), and DeepSeek-V2 (65). No significant difference was found in total scores (P = 0.390). Performance varied by category; ChatGPT-4o excelled in follow-up, while DeepSeek-V2 performed best in diagnostics. Expert reviewers noted ChatGPT-4o's structured responses and Grok's practicality, but highlighted limitations such as insufficient personalization and guideline gaps. Inter-rater agreement was excellent (ICC: 0.986). CONCLUSION: AI-driven LLMs show promise in supporting PE management, though none consistently excel in all domains. Further development is needed to enhance clinical integration and guideline compliance.

Ana Sayfa Hakkımızda İndekslenen Dergiler Detaylı Arama İlgili Kaynaklar İndekse Başvuru İletişim

Türk Medline Ulusal Sağlık Bilimleri – Süreli Yayınlar Veritabanı ile ilgili soru ve istekleriniz için
info@turkmedline.net adresine e-posta iletebilirsiniz