Modelling Unbalanced Catastrophic Health Expenditure Data by Using Machine‐Learning Methods

Çinaroğlu, SONGÜL

doi:10.1002/isaf.1483

Modelling Unbalanced Catastrophic Health Expenditure Data by Using Machine‐Learning Methods

Atıf İçin Kopyala

Çinaroğlu S.

Intelligent Systems in Accounting, Finance and Management, cilt.1, sa.1, ss.1-14, 2020 (ESCI)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 1 Sayı: 1
Basım Tarihi: 2020
Doi Numarası: 10.1002/isaf.1483
Dergi Adı: Intelligent Systems in Accounting, Finance and Management
Derginin Tarandığı İndeksler: Emerging Sources Citation Index (ESCI), Scopus
Sayfa Sayıları: ss.1-14
Hacettepe Üniversitesi Adresli: Evet

Özet

This study aims to compare the performances of logistic regression and random forest classifiers in a balanced oversampling procedure for the prediction of households that will face catastrophic out‐of‐pocket (OOP) health expenditure. Data were derived from the nationally representative household budget survey collected by the Turkish Statistical Institute for the year 2012. A total of 9,987 households returned valid surveys. The data set was highly imbalanced, and the percentage of households facing catastrophic OOP health expenditure was 0.14. Balanced oversampling was performed, and 30 artificial data sets were generated with sizes of 5% and 98% of the original data size. The balanced oversampled data set provided accurate predictions, and random forest exhibited superior performance in identifying households facing catastrophic OOP health expenditure (area under the receiver operating characteristic curve, AUC = 0.8765; classification accuracy, CA = 0.7936; sensitivity = 0.7765; specificity = 0.8552; F1 = 0.7797).