Information Sciences, cilt.655, 2024 (SCI-Expanded)
This article proposes a new fuzzy logistic regression framework with high classification performance against imbalance and separation while keeping the interpretability of classical logistic regression. Separation and imbalance are two core problems in logistic regression, which can result in biased coefficient estimates and inaccurate predictions. Existing research on fuzzy logistic regression primarily focuses on developing possibilistic models instead of using a logit link function that converts log-odds ratios to probabilities. At the same time, little consideration is given to issues of separation and imbalance. Our study aims to address these challenges by proposing new methods of fuzzifying binary variables and classifying subjects based on a comparison against a fuzzy threshold. We use combinations of fuzzy and crisp predictors, output, and coefficients to understand which combinations perform better under imbalance and separation. Numerical experiments with synthetic and real datasets are conducted to demonstrate the usefulness and superiority of the proposed framework. Seven crisp machine learning models are implemented for benchmarking in the numerical experiments. The proposed framework shows consistently strong performance results across datasets with imbalance or separation and performs equally well when such issues are absent. Meanwhile, the considered machine learning methods are significantly impacted by the imbalanced datasets.