Why the CBCL’s Inter-rater Reliability is poor at the Item and Syndrome level
There seem to be no obvious methodological or statistical issues that could account for the poor inter-rater reliability. Embregts argues that informants might have a different view on presence and severity of behavioral problems depending on the frequency and quality of interaction they have with the client (“biased” data). Secondly both judges might have different standards of judgment when interpreting the behavioral data (“biased” interpretation). Another reason for poor inter-rater reliability might be different characteristics of the judges. As Albert Einstein already noted, we can not observe a process without influencing it – that notion is valid on a molecular level and (even more so – in my opinion) on a behavioral level. Different behaviors by the judge might lead the client to act differently. Ultimately data collection would be biased if the behaviour of the judge leads to behavior that would not be observed in a “normal situation”. This is a limitation we have to live with when conduction non-experimental studies. Diagnostic overshadowing is another process that could influence assessment of psychopathology and related behaviours: Judges might be inclined to attribute overlap between symptoms of mental retardation and psychopathology to the mental retardation. Finally the author hypothesis that low level of intellectual functioning make it hard for the judge to project oneself in the clients mental level. It might also be that there is just not one perfect treatment and many roads lead to Rome so to speak. If it is found that different treatment are effective and that they have about the same effect, inter-rater reliability should be computed in relation to treatment families and not single specific treatments, otherwise the IR rating might be much too conservative.