Chat-GPT on brain tumors: An examination of Artificial Intelligence/Machine Learning's ability to provide diagnoses and treatment plans for example neuro-oncology cases

Kozel G., Gurses M. E., Gecici N. N., Gökalp E., Bahadir S., Merenzon M. A., ...More

Clinical Neurology and Neurosurgery, vol.239, 2024 (SCI-Expanded) identifier identifier

  • Publication Type: Article / Article
  • Volume: 239
  • Publication Date: 2024
  • Doi Number: 10.1016/j.clineuro.2024.108238
  • Journal Name: Clinical Neurology and Neurosurgery
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, CAB Abstracts, EMBASE
  • Keywords: Artificial intelligence, ChatGPT 3.5, ChatGPT 4, Neurosurgery, Neurosurgical Treatment
  • Hacettepe University Affiliated: Yes


Objective: Assess the capabilities of ChatGPT-3.5 and 4 to provide accurate diagnoses, treatment options, and treatment plans for brain tumors in example neuro-oncology cases. Methods: ChatGPT-3.5 and 4 were provided with twenty example neuro-oncology cases of brain tumors, all selected from medical textbooks. The artificial intelligence programs were asked to give a diagnosis, treatment option, and treatment plan for each of these twenty example cases. Team members first determined in which cases ChatGPT-3.5 and 4 provided the correct diagnosis or treatment plan. Twenty neurosurgeons from the researchers’ institution then independently rated the diagnoses, treatment options, and treatment plans provided by both artificial intelligence programs for each of the twenty example cases, on a scale of one to ten, with ten being the highest score. To determine whether the difference between the scores of ChatGPT-3.5 and 4 was statistically significant, a paired t-test was conducted for the average scores given to the programs for each example case. Results: In the initial analysis of correct responses, ChatGPT-4 had an accuracy of 85% for its diagnoses of example brain tumors and an accuracy of 75% for its provided treatment plans, while ChatGPT-3.5 only had an accuracy of 65% and 10%, respectively. The average scores given by the twenty independent neurosurgeons to ChatGPT-4 for its accuracy of diagnosis, provided treatment options, and provided treatment plan were 8.3, 8.4, and 8.5 out of 10, respectively, while ChatGPT-3.5’s average scores for these categories of assessment were 5.9, 5.7, and 5.7. These differences in average score are statistically significant on a paired t-test, with a p-value of less than 0.001 for each difference. Conclusions: ChatGPT-4 demonstrates great promise as a diagnostic tool for brain tumors in neuro-oncology, as attested to by the program's performance in this study and its assessment by surveyed neurosurgeon reviewers.