Prediction of gastric cancer by machine learning integrated with mass spectrometry-based N-glycomics

Demirhan D. B., Yılmaz H., Erol H., Kayili H. M., SALİH B.

Analyst, vol.148, 2023 (SCI-Expanded) identifier identifier identifier

  • Publication Type: Article / Article
  • Volume: 148
  • Publication Date: 2023
  • Doi Number: 10.1039/d2an02057b
  • Journal Name: Analyst
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, Aerospace Database, Aqualine, Aquatic Science & Fisheries Abstracts (ASFA), CAB Abstracts, Chemical Abstracts Core, Chimica, Communication Abstracts, Compendex, EMBASE, Food Science & Technology Abstracts, MEDLINE, Metadex, Pollution Abstracts, Veterinary Science Database, Civil Engineering Abstracts
  • Hacettepe University Affiliated: Yes


Early and accurate diagnosis of gastric cancer is vital for effective and targeted treatment. It is known that glycosylation profiles differ in the cancer tissue development process. This study aimed to profile the N-glycans in gastric cancer tissues to predict gastric cancer using machine learning algorithms. The (glyco-) proteins of formalin-fixed parafilm embedded (FFPE) gastric cancer and adjacent control tissues were extracted by chloroform/methanol extraction after the conventional deparaffinization step. The N-glycans were released and labeled with a 2-amino benzoic (2-AA) tag. The MALDI-MS analysis of the 2-AA labeled N-glycans was performed in negative ionization mode, and fifty-nine N-glycan structures were determined. The relative and analyte areas of the detected N-glycans were extracted from the obtained data. Statistical analyses identified significant expression levels of 14 different N-glycans in gastric cancer tissues. The data were separated based on the physical characteristics of N-glycans and used to test in machine-learning models. It was determined that the multilayer perceptron (MLP) was the most appropriate model with the highest sensitivity, specificity, accuracy, Matthews correlation coefficient, and f1 scores for each dataset. The highest accuracy score (96.0 ± 1.3) was obtained from the whole N-glycans relative area dataset, and the AUC value was determined as 0.98. It was concluded that gastric cancer tissues could be distinguished from adjacent control tissues with high accuracy using mass spectrometry-based N-glycomic data.