Şimşek C., Yalçın Ş., Üçdal M. T., Karakoç D.
JOURNAL OF CANCER POLICY, vol.100535, pp.1, 2024 (ESCI)
-
Publication Type:
Article / Article
-
Volume:
100535
-
Publication Date:
2024
-
Doi Number:
10.1016/j.jcpo.2024.100535
-
Journal Name:
JOURNAL OF CANCER POLICY
-
Journal Indexes:
Emerging Sources Citation Index (ESCI), Scopus, CAB Abstracts, MEDLINE
-
Page Numbers:
pp.1
-
Hacettepe University Affiliated:
Yes
Abstract
Background
Colorectal cancer (CRC) remains a significant global health burden, with early detection and intervention crucial for improving patient outcomes. This study aims to develop and evaluate a novel proof-of-concept ensemble framework combining transformer-based language models and decision tree-based models for early-stage CRC screening, diagnosis, and prognosis.
Methods
The ensemble framework consists of four key components: (1) GastroGPT, a transformer-based language model for extracting relevant data points from patient histories; (2) A decision tree-based model for assessing CRC risk and recommending colonoscopy; (3) GastroGPT for extracting data points from early CRC patients' histories; and (4) A suite of decision tree-based models for predicting survival outcomes in early-stage CRC patients. The study employed a retrospective, observational, methodological design using simulated patient cases.(Fig. 1)
Results
GastroGPT demonstrated high accuracy in extracting relevant data points from patient histories. The decision tree-based model for CRC risk assessment achieved an area under the receiver operating characteristic curve (AUC-ROC) of 0.85 (95% CI: 0.78-0.92) in predicting the need for colonoscopy(Graph 1). The decision tree-based models for survival prediction showed strong performance, with C-indices ranging from 0.71 to 0.75 for overall survival and disease-free survival at 24, 36, and 48 months(Graph 2).
Conclusion
The novel ensemble framework demonstrates promising performance in early-stage CRC screening, diagnosis, and prognosis. Further research is needed to validate the models using larger, real-world datasets and to assess their clinical utility in prospective studies.