Is ChatGPT accurate and reliable in answering questions regarding head and neck cancer?

Creative Commons License

Kuşcu O., Pamuk A. E., Sütay Süslü N., Hoşal Ş.

FRONTIERS IN ONCOLOGY, vol.1, no.1, pp.1-7, 2023 (SCI-Expanded)

  • Publication Type: Article / Article
  • Volume: 1 Issue: 1
  • Publication Date: 2023
  • Doi Number: 10.3389/fonc.2023.1256459
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED)
  • Page Numbers: pp.1-7
  • Hacettepe University Affiliated: Yes


Background and objective: Chat Generative Pre-trained Transformer (ChatGPT)

is an artificial intelligence (AI)-based language processing model using deep

learning to create human-like text dialogue. It has been a popular source of

information covering vast number of topics including medicine. Patient

education in head and neck cancer (HNC) is crucial to enhance the

understanding of patients about their medical condition, diagnosis, and

treatment options. Therefore, this study aims to examine the accuracy and

reliability of ChatGPT in answering questions regarding HNC.

Methods: 154 head and neck cancer-related questions were compiled from

sources including professional societies, institutions, patient support groups, and

social media. These questions were categorized into topics like basic knowledge,

diagnosis, treatment, recovery, operative risks, complications, follow-up, and

cancer prevention. ChatGPT was queried with each question, and two

experienced head and neck surgeons assessed each response independently

for accuracy and reproducibility. Responses were rated on a scale: (1)

comprehensive/correct, (2) incomplete/partially correct, (3) a mix of accurate

and inaccurate/misleading, and (4) completely inaccurate/irrelevant.

Discrepancies in grading were resolved by a third reviewer. Reproducibility was

evaluated by repeating questions and analyzing grading consistency.

Results: ChatGPT yielded comprehensive/correctresponses to 133/154

(86.4%) of the questions whereas, rates of incomplete/partially correctand

mixed with accurate and inaccurate data/misleadingresponses were 11% and

2.6%, respectively. There were no completely inaccurate/irrelevantresponses.

According to category, the model provided comprehensive/correctanswers to

80.6% of questions regarding basic knowledge, 92.6% related to diagnosis,

88.9% related to treatment, 80% related to recovery operative risks

complications follow-up, 100% related to cancer preventionand 92.9%

related to other. There was not any significant difference between the

categories regarding the grades of ChatGPT responses (p=0.88). The rate of

reproducibility was 94.1% (145 of 154 questions).

Conclusion: ChatGPT generated substantially accurate and reproducible

information to diverse medical queries related to HNC. Despite its limitations,

it can be a useful source of information for both patients and medical

professionals. With further developments in the model, ChatGPT can also play

a crucial role in clinical decision support to provide the clinicians with up-todate