JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, cilt.23, sa.6, ss.297-304, 2012 (SCI-Expanded)
Sampling strategies which have very significant role on examining data characteristics (i.e. imbalanced, small, exhaustive) have been discussed in the literature for the last couple decades. In this study, the sampling problem encountered on small and continuous data sets is examined. Sampling with measured data by employing k-fold cross validation, and sampling with synthetic data generated by fuzzy c-means clustering are applied, and then the performances of genetic programming (GP) and adaptive neuro fuzzy inference system (ANFIS) on these data sets are discussed. Concluding remarks are that when the experimental results are considered, fuzzy c-means based synthetic sampling is more successful than k-fold cross validation while modeling small and continous data sets with ANFIS and GP, so it can be proposed for these type of data sets. Additionally, ANFIS shows slightly better performance than GP when sytnthetic data is employed, but GP is less sensitive to data set and produces ouputs that are narrower range than ANFIS's outputs while k-fold cross validation is employed.