There are a number of statistics that can be used to determine the reliability of interramas. Different statistics are adapted to different types of measurement. Some options are the common probability of an agreement, Cohens Kappa, Scott`s pi and the Fleiss`Kappa associated with it, inter-rate correlation, correlation coefficient, intra-class correlation and Krippendorff alpha. If the raw data is available in the calculation table, use the interrater agreement in the Statistics menu to establish the ranking table and calculate Kappa (Cohen 1960; Cohen 1968; Fleiss et al., 2003). Bland and Altman[15] expanded this idea by graphically showing the difference in each point, the average difference and the limits of vertical match with the average of the two horizontal ratings. The resulting Bland-Altman plot shows not only the general degree of compliance, but also whether the agreement is related to the underlying value of the article. For example, two advisors could closely match the estimate of the size of small objects, but could disagree on larger objects. The reliability of the interrater is the level of correspondence between councillors or judges. If everyone agrees, IRR is 1 (or 100%) and if not everyone agrees, IRR is 0 (0%). There are several methods of calculating IRR, from the simple (z.B.

percent) to the most complex (z.B. Cohens Kappa). What you choose depends largely on the type of data you have and the number of advisors in your model. As explained above, we found a significant amount of divergent ratings only with the more conservative approach to calculating THE ROI. We looked at factors that could influence the likelihood of diverging ratings. Neither the sex of the child, nor whether it was assessed by two parents or a parent and a teacher, systematically influenced this probability. The bilingualism of the child studied was the only factor studied that increased the likelihood that a child would have divergent values. It is possible that different assessments for the small group of bilingual children reflect systematic differences in vocabulary used in the two different environments: German unilingual daycares and bilingual family environments. Larger groups and more systematic variability in bilingual environmental characteristics are needed to determine whether bilingualism has a systematic effect on advisor compliance, as proposed in this report, and, if so, where this effect originates. Kappa is a way to measure agreements or reliability and to correct the frequency with which ratings might consent to chance. Cohens Kappa,[5] who works for two councillors, and Fleiss` Kappa,[6] an adaptation that works for any fixed number of councillors, improve the common likelihood that they would take into account the amount of agreement that could be expected by chance.

The original versions suffered from the same problem as the probability of joints, as they treat the data as nominal and assume that the evaluations have no natural nature; if the data does have a rank (ordinal measurement value), this information is not fully taken into account in the measurements. Subsequent extensions of the approach included versions that could deal with „under-credits“ and ordinal scales. [7] These extensions converge with the intra-class correlation family (ICC), which allows us to estimate reliability for each level of measurement, from the notion (kappa) to the ordinal (or ICC) at the interval (ICC or ordinal kappa) and the ratio (ICC).