Esra ENSARI, Esra Nagehan Akyol ÖNDER, Pelin ERTAN
Turkish Journal of Nephrology - 2026;35(2):146-153
Background: Hypertension (HT) is a serious public health concern. Although common in adults, the incidence of HT in children and adolescents is increasing. Natural language processing chatbots (NLPCs) are becoming more and more popular as the use of artificial intelligence in the health care sector continues to grow. The aim of this study is to evaluate the accuracy and reproducibility of answers to frequently asked questions (FAQs) about pediatric HT provided by NLPCs such as ChatGPT-4o, Gemini, and Copilot. Methods: Fifty-five FAQs were collected from reputable health sources and grouped into 4 areas: basic information, diagnosis/tests, treatment, and prognosis. Each question was asked twice, 1 week apart, using ChatGPT-4o, Gemini, and Copilot. Based on established clinical guidelines, the answers were independently assessed by 2 pediatric nephrologists. Accuracy was rated on a 4-point scale, and reproducibility was determined by comparing the consistency of responses over time. Results: Across all categories, ChatGPT-4o showed the highest accuracy (91.0%) and reproducibility (91.0%). Gemini and Copilot achieved an accuracy of 65.4% and 54.5%, respectively, with a lower level of consistency (Gemini: 52.7%; Copilot: 41.9%). A small proportion of totally inaccurate or misleading responses were also generated by Copilot. Conclusion: Of the models evaluated, ChatGPT-4o provided the most accurate and reliable information about childhood HT. NLPCs should not replace professional medical advice, although they can support the education of patients and caregivers. The quality and safety of digital health interactions can be improved by incorporating clinical practice guidelines.