Egyptian Journal of Otolaryngology, vol.42, no.1, 2026 (ESCI, Scopus)
Objective Artificial intelligence (AI) tools like ChatGPT are increasingly used by clinicians and patients to access medical information. However, the reliability of AI-generated content compared to expert consensus remains unclear. This research investigated the level of concordance between expert evaluations and ChatGPT-generated responses concerning tinnitus diagnosis and classification within a Delphi-based framework. Methods A Delphi panel of otolaryngology experts rated 38 statements on tinnitus using a 9-point Likert scale. The same statements were submitted to three separate sessions of ChatGPT-4. Agreement between ChatGPT responses and expert ratings was assessed descriptively. Results Of the 38 statements, both ChatGPT and the expert panel classified 21 as consensus and 7 as no consensus. Discrepancies were found in 10 statements, with ChatGPT more frequently assigning a consensus rating to statements where experts did not. Overall, agreement was observed in 28 out of 38 statements (73.7%). Conclusion ChatGPT showed substantial agreement with expert consensus in well-established areas. However, it tended to offer definitive answers even in areas of clinical uncertainty, underscoring the need for expert supervision when using AI in medical contexts. These findings support the cautious integration of AI tools like ChatGPT into clinical otolaryngology. Level of evidence N/A.