FRONTIERS IN ONCOLOGY, cilt.1, sa.1, ss.1-7, 2023 (SCI-Expanded)
Background and objective: Chat Generative Pre-trained Transformer (ChatGPT)
is an artificial intelligence (AI)-based language processing model using deep
learning to create human-like text dialogue. It has been a popular source of
information covering vast number of topics including medicine. Patient
education in head and neck cancer (HNC) is crucial to enhance the
understanding of patients about their medical condition, diagnosis, and
treatment options. Therefore, this study aims to examine the accuracy and
reliability of ChatGPT in answering questions regarding HNC.
Methods: 154 head and neck cancer-related questions were compiled from
sources including professional societies, institutions, patient support groups, and
social media. These questions were categorized into topics like basic knowledge,
diagnosis, treatment, recovery, operative risks, complications, follow-up, and
cancer prevention. ChatGPT was queried with each question, and two
experienced head and neck surgeons assessed each response independently
for accuracy and reproducibility. Responses were rated on a scale: (1)
comprehensive/correct, (2) incomplete/partially correct, (3) a mix of accurate
and inaccurate/misleading, and (4) completely inaccurate/irrelevant.
Discrepancies in grading were resolved by a third reviewer. Reproducibility was
evaluated by repeating questions and analyzing grading consistency.
Results: ChatGPT yielded “comprehensive/correct” responses to 133/154
(86.4%) of the questions whereas, rates of “incomplete/partially correct” and
“mixed with accurate and inaccurate data/misleading” responses were 11% and
2.6%, respectively. There were no “completely inaccurate/irrelevant” responses.
According to category, the model provided “comprehensive/correct” answers to
80.6% of questions regarding “basic knowledge”, 92.6% related to “diagnosis”,
88.9% related to “treatment”, 80% related to “recovery – operative risks –
complications – follow-up”, 100% related to “cancer prevention” and 92.9%
related to “other”. There was not any significant difference between the
categories regarding the grades of ChatGPT responses (p=0.88). The rate of
reproducibility was 94.1% (145 of 154 questions).
Conclusion: ChatGPT generated substantially accurate and reproducible
information to diverse medical queries related to HNC. Despite its limitations,
it can be a useful source of information for both patients and medical
professionals. With further developments in the model, ChatGPT can also play
a crucial role in clinical decision support to provide the clinicians with up-todate
information.