A Novel Ensemble Framework for Comprehensive Early-Stage Colorectal Cancer Diagnosis, Prognosis, and Treatment: Integration of Gastroenterology-Specific Transformer Language Models and Multiple Decision Trees


Şimşek C., Yalçın Ş., Üçdal M. T., Karakoç D.

JOURNAL OF CANCER POLICY, vol.100535, pp.1, 2024 (ESCI)

  • Publication Type: Article / Article
  • Volume: 100535
  • Publication Date: 2024
  • Doi Number: 10.1016/j.jcpo.2024.100535
  • Journal Name: JOURNAL OF CANCER POLICY
  • Journal Indexes: Emerging Sources Citation Index (ESCI), Scopus, CAB Abstracts, MEDLINE
  • Page Numbers: pp.1
  • Hacettepe University Affiliated: Yes

Abstract

Background

Colorectal cancer (CRC) remains a significant global health burden, with early detection and intervention crucial for improving patient outcomes. This study aims to develop and evaluate a novel proof-of-concept ensemble framework combining transformer-based language models and decision tree-based models for early-stage CRC screening, diagnosis, and prognosis.

Methods

The ensemble framework consists of four key components: (1) GastroGPT, a transformer-based language model for extracting relevant data points from patient histories; (2) A decision tree-based model for assessing CRC risk and recommending colonoscopy; (3) GastroGPT for extracting data points from early CRC patients' histories; and (4) A suite of decision tree-based models for predicting survival outcomes in early-stage CRC patients. The study employed a retrospective, observational, methodological design using simulated patient cases.(Fig. 1)

Results

GastroGPT demonstrated high accuracy in extracting relevant data points from patient histories. The decision tree-based model for CRC risk assessment achieved an area under the receiver operating characteristic curve (AUC-ROC) of 0.85 (95% CI: 0.78-0.92) in predicting the need for colonoscopy(Graph 1). The decision tree-based models for survival prediction showed strong performance, with C-indices ranging from 0.71 to 0.75 for overall survival and disease-free survival at 24, 36, and 48 months(Graph 2).

Conclusion

The novel ensemble framework demonstrates promising performance in early-stage CRC screening, diagnosis, and prognosis. Further research is needed to validate the models using larger, real-world datasets and to assess their clinical utility in prospective studies.