Archive for the ‘Research Methodology’ Category
RCTs are great and every empirical scientist loves them. Everyone? No there is a small village in the south of France…Well to be honest I believe that RCT’s are just a hype. Of course they were great in reducing a lot of statistical problems and have helped psychology to become a “respected science” and earn a place in between mother philosophy and father medicine. In the retrospective the big advances in psychology have all been made by individuals that used single case designs (Freud, Piaget, Skinner…). Why is that the cause?
As much as RCTs can tell us about the statistical differences between groups they are not very good at telling us what the processes at hand are. RCT are also often conducted in special settings and with high treatment fidelity and lots of resources, something the “real world” often does not have to offer. All of this makes it complicated to derive any practical treatment value from them. If there is some practical value to it, one will have to search for it by reading the whole article and giving a few hours of thought to it.
Scientists write for scientists (or rather they write to please their peer-reviewers). This is a mindset quite far from the therapist that lives in a world in which all the factors that are excluded from RCT (for obvious reasons), like: economical and technical problems, comorbidity, lack of resources, converge and interact with each other.
The mindset of the therapist often does not entail thoughts of simple causality of X influencing Y, but of a multidimensional system in which all factors interact with each other. Thus the information value derived from RCTs might seem huge for the scientists, but low for the therapist that needs support in his treatment decisions.
Effect size is a parametric test, belonging to the GLM family that allows to compare the size of an effect between two groups (or for that matter two conditions in one group). The test can be used in all kinds of research designs (experimental and non-experimental) that employ different kind of measurements or/and manipulations. Given that the study itself is not biased, it offers an easy solution to compare the effect of different manipulations with each other. It also gives an indication of the real-life value of a given intervention (when compared to baseline). The concept of effect size is one of the core paradigms in scientific evidence-based therapy research and has stimulated the development of effective therapies, by giving a “golden” standard on which effect can be compared. Recently Cohen’s system of labelling different effect sizes has been criticized, still I would consider it to be scientific mainstream knowledge and the wide use and acceptance in the scientific community reinforces its use.
There seem to be no obvious methodological or statistical issues that could account for the poor inter-rater reliability. Embregts argues that informants might have a different view on presence and severity of behavioral problems depending on the frequency and quality of interaction they have with the client (“biased” data). Secondly both judges might have different standards of judgment when interpreting the behavioral data (“biased” interpretation). Another reason for poor inter-rater reliability might be different characteristics of the judges. As Albert Einstein already noted, we can not observe a process without influencing it – that notion is valid on a molecular level and (even more so – in my opinion) on a behavioral level. Different behaviors by the judge might lead the client to act differently. Ultimately data collection would be biased if the behaviour of the judge leads to behavior that would not be observed in a “normal situation”. This is a limitation we have to live with when conduction non-experimental studies. Diagnostic overshadowing is another process that could influence assessment of psychopathology and related behaviours: Judges might be inclined to attribute overlap between symptoms of mental retardation and psychopathology to the mental retardation. Finally the author hypothesis that low level of intellectual functioning make it hard for the judge to project oneself in the clients mental level. It might also be that there is just not one perfect treatment and many roads lead to Rome so to speak. If it is found that different treatment are effective and that they have about the same effect, inter-rater reliability should be computed in relation to treatment families and not single specific treatments, otherwise the IR rating might be much too conservative.