Türk Medline
ADR Yönetimi
ADR Yönetimi

INTEROBSERVER RELIABILITY OF THE KELLGREN-LAWRENCE CLASSIFICATION IN KNEE OSTEOARTHRITIS: A COMPARISON BETWEEN ORTHOPEDIC SURGEONS AND ARTIFICIAL INTELLIGENCE

Şafak SAYAR, Mustafa BOZ, Yasemin Begüm TOPKARCI, Suat BATAR, Necdet DEMİR

Eurasian Journal of Medical Investigation - 2026;10(1):80-83

Department of Orthopaedics and Traumatology, Biruni University Faculty of Medicine Hospital, Istanbul

 

Objectives: To evaluate the interobserver reliability of the Kellgren-Lawrence (KL) classification among orthopedic surgeons and to compare their assessments with artificial intelligence (AI) systems. Methods: One hundred anteroposterior weight-bearing knee radiographs from patients aged 65 years and older were retrospectively analyzed. Four orthopedic surgeons and two AI systems independently graded all radiographs according to the KL classification and were blinded to clinical information and to each other's evaluations. Interobserver agreement was assessed using quadratically weighted Cohen's kappa and intraclass correlation coefficients (ICC). Results: Interobserver agreement among orthopedic surgeons demonstrated good reliability (mean weightedkappa=0.780; ICC=0.784). Agreement between the orthopedic consensus and ChatGPT was moderate (kappa=0.481), whereas Gemini demonstrated moderate-to-good agreement (kappa=0.561). Agreement between the two AI systems was also moderate (kappa=0.484). Conclusion: The KL classification demonstrated good reliability among orthopedic surgeons. AI systems demonstrated moderate agreement with orthopedic experts and may serve as supportive screening tools rather than as diagnostic replacements.