Directional and Multivariate Statistics: A Volume in Honour of Ashis SenGupta, Somesh Kumar,Barry C. Arnold,Kunio Shimizu,Arnab Kumar Laha, Editör, Springer Singapore, Singapore, ss.411-439, 2025
The feature selection stage can be used to create machine learning algorithms, which can lead to better outcomes. The dependency structure between the variables is regarded as the most crucial factor in the feature selection stage. Copula-Based Clustering technique (CoClust), which relies on non-linear dependency and groups only related variables, makes a difference in identifying the dependency structure. In this study, we demonstrate that by combining the Random Forest, AdaBoost, and XGBoost approaches with the CoClust-based feature selection step, it is possible to achieve a notable improvement in CPU times and accuracy. On two different big data sets, we compare CoClust with K-means and hierarchical clustering techniques in order to assess its contribution to algorithms. CPU time, accuracy, and ROC (receiver operating characteristic) curve are used to compare the results.