Introduction To obtain more reliable results from observation-based assessments, high-quality raters are key. Although this quality can be obtained by using instructors, extra workload can be a burden on them. To overcome this problem, one alternative to instructor raters can be standardized patients (SPs). Method In this study, the students carried out an SP interview related to communication with an applicant/patient in the context of clinical skills training course. SPs rated student interviews just after interview and after watching a recording. Instructors rated students just by watching the recordings. To determine the appropriateness of use of SPs as raters, ratings of SPs and instructors were compared by using mean scores given to the interview performances of students' communication skills. Moreover, G theory was used to determine the reliability of scores. Results Standardized patients' ratings immediately after the interviews showed the highest scores, and these ratings were statistically different from the SPs' and instructors' ratings done while watching recordings. Besides, the G coefficient for the 4 instructors was 0.71, while that for the 12 SPs was estimated as 0.73. However, even when using 12 SPs, the obtained reliability coefficient of 0.73 brings into question the reliability of their ratings. Moreover, it was found that the one who contributed the most to reliability among instructors was the most experienced person in subject area. Conclusions If SPs are to be used as raters, they will need more comprehensive training. More importantly, regardless of who the rater is, rater training is one of the most important factors in achieving more reliable and valid results. Moreover, having experience and knowledge about assessed topic is another crucial point of performance assessment by means of obtaining reliable results.