A Comparison of Different Designs in Scoring of PISA 2009 Reading Open Ended Items According to Generalizability Theory

Alkan, Meral; DOĞAN, NURİ

doi:10.21031/epod.1210917

A Comparison of Different Designs in Scoring of PISA 2009 Reading Open Ended Items According to Generalizability Theory

Atıf İçin Kopyala

Alkan M., DOĞAN N.

Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi, cilt.14, sa.2, ss.106-117, 2023 (Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 14 Sayı: 2
Basım Tarihi: 2023
Doi Numarası: 10.21031/epod.1210917
Dergi Adı: Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi
Derginin Tarandığı İndeksler: Scopus, TR DİZİN (ULAKBİM)
Sayfa Sayıları: ss.106-117
Hacettepe Üniversitesi Adresli: Evet

Özet

This study compares the different designs obtained through four raters’ scoring the open-ended items used in PISA 2009 reading literacy altogether or alternately according to the Generalizability Theory. The sample of the research was composed of 362 students (out of 4996 students participating in PISA 2009) who responded to the items of reading skills and who were scored by more than one rater. Two designs were created so as to be used in generalizability theory in the study. One of them was the crossed design symbolized as “s x i x r” (student x item x rater), in which students are scored by each rater in terms of the same skills. The second was the nested design symbolized as “(r:s) x i”, where each rater scored only a group of students and raters are nested in students and the items were crossed with these variables. On comparing the s x i x r design with (r:s) x i design, it was found that the relative and absolute error variances estimated for (r:s) x i design were smaller than those for s x i x r design and that therefore the G and Phi coefficients took on bigger values. On increasing the number of raters in both designs, the G and Phi coefficients also increased in the D study. While acceptable values of G and Phi coefficients were reached on reducing the number of raters by half in Booklet 2, raising the number of raters seemed more appropriate in Booklet 8.