Classification performance comparison of deep learning and classical data mining methods on RNA-Seq data set


Kasikci M., Cosgun E., KARABULUT E.

INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, vol.26, no.3-4, pp.188-201, 2021 (SCI-Expanded) identifier

  • Publication Type: Article / Article
  • Volume: 26 Issue: 3-4
  • Publication Date: 2021
  • Doi Number: 10.1504/ijdmb.2021.126844
  • Journal Name: INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Aerospace Database, BIOSIS, Biotechnology Research Abstracts, Communication Abstracts, INSPEC, Metadex, Civil Engineering Abstracts
  • Page Numbers: pp.188-201
  • Keywords: RNA-Seq, cancer, data mining, classification methods, deep learning
  • Hacettepe University Affiliated: Yes

Abstract

In this study, it is aimed to compare the performance of deep learning and classical classification methods in the classification of RNA-Seq data, which is one of the data sources used to investigate the relationship between disease and genes. Two data sets with different characteristics are used. The first data set, the lung cancer data set, has two classes and balanced class ratios. The second data set is the renal cell carcinoma data set, which has three imbalanced classes. Different gene filtering methods are applied to these data sets. The classification performances of random forest, support vector machines, artificial neural network and deep learning on two data sets and different filters are evaluated. Hyper-parameters are optimised for each classification method. In general, deep learning and support vector machines have the highest or second highest values in terms of performance measures such as accuracy, F-measure and Kappa coefficient. In the lung cancer data sets that contain more genes and show a balanced class distribution, deep learning outperforms classical classification methods and it is recommended to use.