Classification performance comparison of deep learning and classical data mining methods on RNA-Seq data set

Kasikci, MERVE; Cosgun, Erdal; KARABULUT, ERDEM

doi:10.1504/ijdmb.2021.126844

Classification performance comparison of deep learning and classical data mining methods on RNA-Seq data set

Kasikci M., Cosgun E., KARABULUT E.

INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, cilt.26, sa.3-4, ss.188-201, 2021 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 26 Sayı: 3-4
Basım Tarihi: 2021
Doi Numarası: 10.1504/ijdmb.2021.126844
Dergi Adı: INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Aerospace Database, BIOSIS, Biotechnology Research Abstracts, Communication Abstracts, INSPEC, Metadex, Civil Engineering Abstracts
Sayfa Sayıları: ss.188-201
Anahtar Kelimeler: RNA-Seq, cancer, data mining, classification methods, deep learning
Hacettepe Üniversitesi Adresli: Evet

Özet

In this study, it is aimed to compare the performance of deep learning and classical classification methods in the classification of RNA-Seq data, which is one of the data sources used to investigate the relationship between disease and genes. Two data sets with different characteristics are used. The first data set, the lung cancer data set, has two classes and balanced class ratios. The second data set is the renal cell carcinoma data set, which has three imbalanced classes. Different gene filtering methods are applied to these data sets. The classification performances of random forest, support vector machines, artificial neural network and deep learning on two data sets and different filters are evaluated. Hyper-parameters are optimised for each classification method. In general, deep learning and support vector machines have the highest or second highest values in terms of performance measures such as accuracy, F-measure and Kappa coefficient. In the lung cancer data sets that contain more genes and show a balanced class distribution, deep learning outperforms classical classification methods and it is recommended to use.