Measuring language ability of students with compensatory multidimensional CAT: A post-hoc simulation study


Creative Commons License

Ozdemir B., GELBAL S.

EDUCATION AND INFORMATION TECHNOLOGIES, cilt.27, sa.5, ss.6273-6294, 2022 (SSCI) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 27 Sayı: 5
  • Basım Tarihi: 2022
  • Doi Numarası: 10.1007/s10639-021-10853-0
  • Dergi Adı: EDUCATION AND INFORMATION TECHNOLOGIES
  • Derginin Tarandığı İndeksler: Social Sciences Citation Index (SSCI), Emerging Sources Citation Index (ESCI), Scopus, EBSCO Education Source, Educational research abstracts (ERA), ERIC (Education Resources Information Center), INSPEC
  • Sayfa Sayıları: ss.6273-6294
  • Anahtar Kelimeler: Computerized adaptive testing, Language testing, Post-hoc simulation, Multidimensional IRT, ITEM SELECTION, ADAPTIVE TEST, CONSTRAINTS, INFORMATION
  • Hacettepe Üniversitesi Adresli: Evet

Özet

The computerized adaptive tests (CAT) apply an adaptive process in which the items are tailored to individuals' ability scores. The multidimensional CAT (MCAT) designs differ in terms of different item selection, ability estimation, and termination methods being used. This study aims at investigating the performance of the MCAT designs used to measure the language ability of students and to compare the results of MCAT designs with the outcomes of corresponding paper-pencil tests. For this purpose, items in the English Proficiency Tests (EPT) were used to create a multi-dimensional item pool that consists of 599 items. The performance of the MCAT designs was evaluated and compared based on the reliability coefficients, root means square error (RMSE), test-length, and root means squared difference (RMSD) statistics, respectively. Therefore, 36 different conditions were investigated in total. The results of the post-hoc simulation designs indicate that the MCAT designs with the A-optimality item selection method outperformed MCAT designs with other item selection methods by decreasing the test length and RMSD values without any sacrifice in test reliability. Additionally, the best error variance stopping rule for each MCAT algorithm with A-optimality item selection could be considered as 0.25 with 27.9 average test length and 30 items for the fixed test-length stopping rule for the Bayesian MAP method. Overall, MCAT designs tend to decrease the test length by 60 to 65 percent and provide ability estimations with higher precision compared to the traditional paper-pencil tests with 65 to 75 items. Therefore, it is suggested to use the A-optimality method for item selection and the Bayesian MAP method for ability estimation for the MCAT designs since the MCAT algorithm with these specifications shows better performance than others.