Comparative regression performances of machine learning methods optimising hyperparameters: application to health expenditures


Çinaroğlu S. , Başer O.

International Journal of Bioinformatics Research and Applications, vol.16, no.4, pp.387-407, 2020 (Refereed Journals of Other Institutions)

  • Publication Type: Article / Article
  • Volume: 16 Issue: 4
  • Publication Date: 2020
  • Title of Journal : International Journal of Bioinformatics Research and Applications
  • Page Numbers: pp.387-407

Abstract

Machine learning (ML) algorithms are used in various areas.

However, there has been no study analysing health expenditures using ML

methods. This work is a step forward in comparing the regression

performances of lasso (L), K-nearest neighbourhood (KNN), Random Forest

(RF) and support vector machine (SVM) regression while changing

hyperparameter values. In this study, lambda (λ), number of neighbours (NN),

number of trees (NT) and epsilon (ε) parameter for L, KNN, RF and SVM

regression were determined as hyperparameters, respectively. K-fold crossvalidation

was performed to examine regression performance results. Study

results show that KNN (R2 > 0.75; RMSE < 0.70; MAE < 0.55) and

L (R2 > 0.79; RMSE < 0.20; MAE < 0.15) regression yields better results in

predicting health expenditure per capita and out-of-pocket health expenditure

(%) respectively. Moreover, L, KNN, RF and SVM regression methods

performance differences are statistically significant (p < 0.001). It is hoped that

these results will stimulate further interest in using ML methods to predict

health expenditures.