Improving Machine Learning Algorithms with CoClust-Based Feature Selection on Big Data: A Comparative Analysis

Ilhan Taşkin, Zeynep; Yildirak, ŞAHAP

Improving Machine Learning Algorithms with CoClust-Based Feature Selection on Big Data: A Comparative Analysis

Directional and Multivariate Statistics: A Volume in Honour of Ashis SenGupta, Somesh Kumar,Barry C. Arnold,Kunio Shimizu,Arnab Kumar Laha, Editör, Springer Singapore, Singapore, ss.411-439, 2025

Yayın Türü: Kitapta Bölüm / Araştırma Kitabı
Basım Tarihi: 2025
Yayınevi: Springer Singapore
Basıldığı Şehir: Singapore
Sayfa Sayıları: ss.411-439
Editörler: Somesh Kumar,Barry C. Arnold,Kunio Shimizu,Arnab Kumar Laha, Editör
Hacettepe Üniversitesi Adresli: Evet

Özet

The feature selection stage can be used to create machine learning algorithms, which can lead to better outcomes. The dependency structure between the variables is regarded as the most crucial factor in the feature selection stage. Copula-Based Clustering technique (CoClust), which relies on non-linear dependency and groups only related variables, makes a difference in identifying the dependency structure. In this study, we demonstrate that by combining the Random Forest, AdaBoost, and XGBoost approaches with the CoClust-based feature selection step, it is possible to achieve a notable improvement in CPU times and accuracy. On two different big data sets, we compare CoClust with K-means and hierarchical clustering techniques in order to assess its contribution to algorithms. CPU time, accuracy, and ROC (receiver operating characteristic) curve are used to compare the results.