Detection of grey zones in inter-rater agreement studies

Creative Commons License

Demirhan H., YILMAZ A. E.

BMC Medical Research Methodology, vol.23, no.1, 2023 (SCI-Expanded) identifier identifier identifier

  • Publication Type: Article / Article
  • Volume: 23 Issue: 1
  • Publication Date: 2023
  • Doi Number: 10.1186/s12874-022-01759-7
  • Journal Name: BMC Medical Research Methodology
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, CINAHL, EMBASE, MEDLINE, Directory of Open Access Journals
  • Keywords: Cohen's Kappa, Gray zone, Inter-rater reliability, Ordinal levels, Transition zone, Weighted kappa
  • Hacettepe University Affiliated: Yes


© 2022, The Author(s).Background: In inter-rater agreement studies, the assessment behaviour of raters can be influenced by their experience, training levels, the degree of willingness to take risks, and the availability of clear guidelines for the assessment. When the assessment behaviour of raters differentiates for some levels of an ordinal classification, a grey zone occurs between the corresponding adjacent cells to these levels around the main diagonal of the table. A grey zone introduces a negative bias to the estimate of the agreement level between the raters. In that sense, it is crucial to detect the existence of a grey zone in an agreement table. Methods: In this study, a framework composed of a metric and the corresponding threshold is developed to identify grey zones in an agreement table. The symmetry model and Cohen’s kappa are used to define the metric, and the threshold is based on a nonlinear regression model. A numerical study is conducted to assess the accuracy of the developed framework. Real data examples are provided to illustrate the use of the metric and the impact of identifying a grey zone. Results: The sensitivity and specificity of the proposed framework are shown to be very high under moderate, substantial, and near-perfect agreement levels for 3 × 3 and 4 × 4 tables and sample sizes greater than or equal to 100 and 50, respectively. Real data examples demonstrate that when a grey zone is detected in the table, it is possible to report a notably higher level of agreement in the studies. Conclusions: The accuracy of the proposed framework is sufficiently high; hence, it provides practitioners with a precise way to detect the grey zones in agreement tables.