A Monte Carlo fuzzy logistic regression framework against imbalance and separation

Charizanos, Georgios; Demirhan, Haydar; İÇEN, DUYGU

doi:10.1016/j.ins.2023.119893

A Monte Carlo fuzzy logistic regression framework against imbalance and separation

Charizanos G., Demirhan H., İÇEN D.

Information Sciences, cilt.655, 2024 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 655
Basım Tarihi: 2024
Doi Numarası: 10.1016/j.ins.2023.119893
Dergi Adı: Information Sciences
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, Aerospace Database, Applied Science & Technology Source, Business Source Elite, Business Source Premier, Communication Abstracts, Compendex, Computer & Applied Sciences, INSPEC, Library, Information Science & Technology Abstracts (LISTA), Metadex, MLA - Modern Language Association Database, zbMATH, Civil Engineering Abstracts
Anahtar Kelimeler: Binary response, Fuzzification, Imbalance, Logistic regression, Monte Carlo, Separation, Triangular fuzzy numbers
Hacettepe Üniversitesi Adresli: Evet

Özet

This article proposes a new fuzzy logistic regression framework with high classification performance against imbalance and separation while keeping the interpretability of classical logistic regression. Separation and imbalance are two core problems in logistic regression, which can result in biased coefficient estimates and inaccurate predictions. Existing research on fuzzy logistic regression primarily focuses on developing possibilistic models instead of using a logit link function that converts log-odds ratios to probabilities. At the same time, little consideration is given to issues of separation and imbalance. Our study aims to address these challenges by proposing new methods of fuzzifying binary variables and classifying subjects based on a comparison against a fuzzy threshold. We use combinations of fuzzy and crisp predictors, output, and coefficients to understand which combinations perform better under imbalance and separation. Numerical experiments with synthetic and real datasets are conducted to demonstrate the usefulness and superiority of the proposed framework. Seven crisp machine learning models are implemented for benchmarking in the numerical experiments. The proposed framework shows consistently strong performance results across datasets with imbalance or separation and performs equally well when such issues are absent. Meanwhile, the considered machine learning methods are significantly impacted by the imbalanced datasets.