Ümmügülsüm COŞKUN, Ayşegül ERTEN TAYŞİ
Clinical and Experimental Health Sciences - 2025;15(4):866-874
Objective: This study aimed to evaluate the performance of artificial intelligence (AI)-based large language models (LLMs) in providing medical treatment recommendations for clinical scenarios in dental practice, focusing on rational prescribing and drug interaction management. Methods: Forty standardized clinical case questions were developed by experienced oral and maxillofacial surgeons and submitted to ChatGPT-3.5 (OpenAI), ChatGPT-4o (OpenAI), and Gemini 2.5 Flash (Google DeepMind). Responses were generated in Turkish using standardized prompts and were independently assessed by two blinded evaluators based on three criteria: modified Global Quality Score (GQS), accuracy, and completeness. Data were statistically analyzed using non-parametric methods. Results: Gemini 2.5 Flash achieved the highest performance across all criteria, with 45% of responses rated as being of very high quality (GQS score 5), 67.5% rated as highly accurate (scores of 5-6), and 42.5% rated as complete. ChatGPT-4o outperformed ChatGPT-3.5 in all parameters but did not differ significantly from Gemini. Statistically significant differences were observed between ChatGPT-3.5 and Gemini for GQS (p<.001), accuracy (p=.007), and completeness (p=.001). Conclusion: AI-based chatbots, particularly Gemini 2.5 Flash, demonstrated promising capabilities in providing accurate and complete prescribing suggestions in dental scenarios. These findings suggest that advanced LLMs may serve as valuable clinical decision-support tools and educational aids for newly graduated dentists. However, their use should be supervised by experienced professionals to reduce risks associated with incomplete or incorrect information.