Ebru KARAKAYA GOJAYEV, Zahide Çiler BÜYÜKATALAY
The Turkish Journal of Ear Nose and Throat - 2025;35(4):171-177
Objective: To evaluate the diagnostic performance of a general-purpose artificial intelligence (AI) model in classifying vocal fold lesions using static laryngeal images. Materials and Methods: This retrospective study included 175 cases representing 14 vocal fold pathologies. Two static endoscopic frames per case-captured during inspiration and phonation-were analysed using a GPT-4-based AI model via structured diagnostic prompts. The model had no prior training on laryngeal images. The diagnostic accuracy, sensitivity, specificity, precision, and F1-score were calculated. Chi-square testing was used to compare the observed accuracy to chance. Results: The overall diagnostic accuracy was 29.14%. The model showed perfect accuracy (100%) in vocal fold haemorrhage and chronic fungal laryngitis, but failed to identify vocal fold paralysis and leukoplakia. The sensitivity ranged from 0% to 100%, while the specificity was more stable (66%-75%). The macro-average and weighted-average F1-scores were 33.38% and 29.14%, respectively. The model performed significantly better than chance (p<0.001), with substantial variation across diagnoses. Conclusion: Although the performance was inconsistent across pathologies, the model demonstrated high diagnostic accuracy in selected lesions. These findings support the potential of AI-assisted tools in laryngeal diagnostics, while underscoring the need for domain-specific training and validation.