The aim of this study was to find out similarities and differences between methods for differential item functioning (DIF) such as MIMIC, SIBTEST, Logistic Regression and Mantel-Haenszel methods. Also the results obtained with expert opinions according to the methods were examined in terms of consistency. The study was carried out with subsamples of 300, 600, 1000, 1200 and 2000 examinees selected from the dataset of approximately 340.000 students. For the four methods, common items containing DIF were examined by sample groups. It was seen that item 2 does not contain common DMF in the sample of 300 persons, item 13 in the sample of 600, and no items contain common DIF in the sample of 1000 persons; whereas item 19 contains DIF in the group of 1200 and items 2, 3, and 4 contain DIF in the group of 2000 persons. In the light of this, it can be suggested that the methods for 2000 persons yielded more compliant results in the large sample. By comparing items 2, 5, 6 and 12 identified to be biased according to expert opinion with statistical analysis results, it was found out that those items showed DIF with different methods or samples. In addition, expert opinions seem to be consistent with results of the analysis.