Comparative regression performances of machine learning methods optimising hyperparameters: application to health expenditures

Çinaroğlu S., Başer O.

International Journal of Bioinformatics Research and Applications, vol.16, no.4, pp.387-407, 2020 (Scopus)

  • Publication Type: Article / Article
  • Volume: 16 Issue: 4
  • Publication Date: 2020
  • Journal Name: International Journal of Bioinformatics Research and Applications
  • Journal Indexes: Scopus, Aerospace Database, Agricultural & Environmental Science Database, Biotechnology Research Abstracts, Communication Abstracts, INSPEC, Metadex, Civil Engineering Abstracts
  • Page Numbers: pp.387-407
  • Hacettepe University Affiliated: Yes


Machine learning (ML) algorithms are used in various areas.

However, there has been no study analysing health expenditures using ML

methods. This work is a step forward in comparing the regression

performances of lasso (L), K-nearest neighbourhood (KNN), Random Forest

(RF) and support vector machine (SVM) regression while changing

hyperparameter values. In this study, lambda (λ), number of neighbours (NN),

number of trees (NT) and epsilon (ε) parameter for L, KNN, RF and SVM

regression were determined as hyperparameters, respectively. K-fold crossvalidation

was performed to examine regression performance results. Study

results show that KNN (R2 > 0.75; RMSE < 0.70; MAE < 0.55) and

L (R2 > 0.79; RMSE < 0.20; MAE < 0.15) regression yields better results in

predicting health expenditure per capita and out-of-pocket health expenditure

(%) respectively. Moreover, L, KNN, RF and SVM regression methods

performance differences are statistically significant (p < 0.001). It is hoped that

these results will stimulate further interest in using ML methods to predict

health expenditures.