Comparative regression performances of machine learning methods optimising hyperparameters: application to health expenditures


Çinaroğlu S., Başer O.

International Journal of Bioinformatics Research and Applications, cilt.16, sa.4, ss.387-407, 2020 (Scopus)

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 16 Sayı: 4
  • Basım Tarihi: 2020
  • Dergi Adı: International Journal of Bioinformatics Research and Applications
  • Derginin Tarandığı İndeksler: Scopus, Aerospace Database, Agricultural & Environmental Science Database, Biotechnology Research Abstracts, Communication Abstracts, INSPEC, Metadex, Civil Engineering Abstracts
  • Sayfa Sayıları: ss.387-407
  • Hacettepe Üniversitesi Adresli: Evet

Özet

Machine learning (ML) algorithms are used in various areas.

However, there has been no study analysing health expenditures using ML

methods. This work is a step forward in comparing the regression

performances of lasso (L), K-nearest neighbourhood (KNN), Random Forest

(RF) and support vector machine (SVM) regression while changing

hyperparameter values. In this study, lambda (λ), number of neighbours (NN),

number of trees (NT) and epsilon (ε) parameter for L, KNN, RF and SVM

regression were determined as hyperparameters, respectively. K-fold crossvalidation

was performed to examine regression performance results. Study

results show that KNN (R2 > 0.75; RMSE < 0.70; MAE < 0.55) and

L (R2 > 0.79; RMSE < 0.20; MAE < 0.15) regression yields better results in

predicting health expenditure per capita and out-of-pocket health expenditure

(%) respectively. Moreover, L, KNN, RF and SVM regression methods

performance differences are statistically significant (p < 0.001). It is hoped that

these results will stimulate further interest in using ML methods to predict

health expenditures.