Türk Medline
ADR Yönetimi
ADR Yönetimi

EVALUATION OF THE ACCURACY, RELIABILITY, QUALITY, AND READABILITY OF ARTIFICIAL INTELLIGENCE CHATBOTS-GENERATED RESPONSES TO ACNE-RELATED QUESTIONS

Ecem Bostan, Mahmut Talha Uçar, Elif Dönmez

Turkish Journal of Dermatology - 2025;19(4):235-243

Department of Dermatology and Venereology, Ankara Medipol University Faculty of Medicine, Ankara, Türkiye

 

Aim: Since artificial intelligence (AI) has entered our lives, it has been widely used in daily medical practice to determine accurate diagnoses, predict prognosis, and inform about various treatment modalities. Acne vulgaris is one of the most frequently encountered skin problems in dermatology. Patients with acne can consult AI. The aim of the present study was to evaluate the accuracy, reliability, quality, and readability of AI-generated responses to frequently asked acne-related questions. Materials and Methods: To evaluate the accuracy, reliability, quality, and readability of AI-generated responses to acne-related queries, a multi-domain assessment approach involving four validated tools [modified DISCERN, Global Quality Scale (GQS), Flesch Reading Ease score (FRES), and 5-point Likert scale] was used. Results: Among the three generative AI chatbots, DeepSeek achieved the highest mean FRES, followed by ChatGPT-4.0 and ChatGPT-4.5. For modified DISCERN scores, ChatGPT-4.5 achieved the highest mean score, followed by ChatGPT-4.0 and DeepSeek, indicating superior information quality in ChatGPT-4.5 responses. The mean FRES was highest for DeepSeek among the three generative AI chatbots, whereas ChatGPT-4.5 had the highest mean modified DISCERN score. This suggests that ChatGPT-4.5 responses have higher informational quality. In terms of accuracy, ChatGPT-4.5 again achieved the highest mean score. ChatGPT-4.5 scored the highest GQS, slightly above ChatGPT-4.0, with DeepSeek scoring the lowest. Conclusion: These results highlight that ChatGPT-4.5 generally provided more accurate, higher-quality responses, whereas DeepSeek offered superior readability according to the Flesch Reading Ease metric.