Eyüpcan Şensoy, Mehmet Çıtırık
Kastamonu Medical Journal - 2025;5(4):227-230
Aims: Our study aims to investigate the effect of language differences on the success of the freely available Artificial Intelligence programs ChatGPT-3.5, Gemini, and Copilot in answering multiple-choice questions about retina and vitreous diseases. Methods: Forty-six questions related to retinal and vitreous diseases were included in the study. These questions were translated into Turkish by a certified native speaker. Artificial Intelligence programs called ChatGPT-3.5, Gemini, and Copilot were applied one by one to questions in English and Turkish. The answer options given to the questions claimed to be correct were compared with the answer key and grouped as correct and incorrect. Their success in answering the questions correctly was compared statistically with each other. Results: ChatGPT-3.5, Gemini, and Copilot correctly answered the questions in English at a rate of 54.3%, 69.6%, and 63%, respectively. ChatGPT-3.5, Gemini, and Copilot answered the Turkish questions correctly at a rate of 43.5%, 60.9%, and 52.2%, respectively. There was no statistically significant difference between chatbots in answering the Turkish versions of the questions, although there were fewer correct answers in all three applications (p>0.05). Conclusion: Although no statistically significant difference was detected, the lower success rate of chatbots in answering Turkish questions revealed that these programs need to be developed in terms of understanding and applying language translations as well as the need to improve their knowledge level.