Türk Medline
ADR Yönetimi
ADR Yönetimi

CHATGPT-4O'S PERFORMANCE IN DETECTING DIFFUSION RESTRICTION ON BRAIN MRI OF NEONATES WITH HYPOXIC-ISCHEMIC ENCEPHALOPATHY: DOES CLINICAL INFORMATION INFLUENCE DIAGNOSTIC INTERPRETATION?

Cemre ÖZENBAŞ, Burcin İŞCAN

Türkiye Çocuk Hastalıkları Dergisi - 2026;20(2):73-78

Department of Radiology, Private Buca Hospital, İzmir Tınaztepe University, İzmir, Türkiye

 

Objective: Hypoxic-ischemic encephalopathy is a major cause of neonatal morbidity and mortality. Diffusion-weighted imaging plays a key role in early diagnosis. With the increasing interest in large language models like ChatGPT, it is important to evaluate their potential in radiological interpretation. The aim of this study was to assess the diagnostic performance of ChatGPT-4o in identifying diffusion restriction on neonatal brain Diffusion-weighted imaging (DWI) and to determine whether clinical information (Thompson score) influences its diagnostic responses. Material and Methods: This retrospective study included 36 neonates (18 with and 18 without diffusion restriction) who underwent brain DWI MRI between postnatal days 4 and 7. For each case, representative DWI and Apparent diffusion coefficient (ADC) images were uploaded to ChatGPT-4o in five separate sessions. The same process was repeated after adding the Thompson score. Performance was evaluated using sensitivity, specificity, Positive predictive value (PPV), Negative predictive value (NPV), Odds ratio (OR), Fleiss Kappa, and McNemar test. Results: Without clinical information, sensitivity was 56.7%, specificity 90%, PPV 85%, and NPV 67.5% (OR; 11.77). With the Thompson score, sensitivity increased to 72.2%, specificity to 91.1%, PPV to 89%, and NPV to 76.6% (OR; 26.65). Intra-observer agreement was very high (without vs. with Thompson score; kappa= 0.825 vs. kappa= 0.920). McNemar test showed a statistically significant difference (p=0.045) after clinical data were included. Conclusion: ChatGPT-4o showed high specificity and moderate sensitivity in detecting diffusion restriction on DWI. Clinical information significantly influenced diagnostic responses, highlighting both the potential and limitations of large language models (LLMs) in radiology.