higher on the hierarchy for assessing intervention effectiveness than do correlational studies.
At the bottom of the hierarchy are the following types of studies:
Anecdotal case reports
Pretest-posttest studies without control groups
Qualitative descriptions of client experiences during or after treatment
Surveys of clients asking what they think helped them
Surveys of practitioners asking what they think is effective
Residing at the bottom of the hierarchy does not mean that these studies have no evidentiary value regarding the effectiveness of interventions. Each of these types of studies can have significant value. Although none of them meet the three criteria for inferring causality (i.e., establishing correlation and time order while eliminating plausible alternative explanations), they each offer some useful preliminary evidence that can inform practice decisions when higher levels of evidence are not available for a particular type of problem or practice context. Moreover, each can generate hypotheses about interventions that can then be tested in studies providing more control for alternative explanations.
Table 3.1 is an example of a research hierarchy representing the various types of studies and their levels on the evidentiary hierarchy for answering EIP questions about effectiveness and prevention. Effectiveness evidence hierarchies are the most commonly described hierarchies in research, but we could create an analogous list for each of the different types of EIP questions.
TABLE 3.1 Evidentiary Hierarchy for EIP Questions about Effectiveness
Level | Type of study |
---|---|
1 | Systematic reviews and meta-analyses |
2 | Multisite replications of randomized experiments |
3 | Randomized experiment |
4 | Quasi-experiments |
5 | Single-case experiments |
6 7 | Correlational studies Pretest/posttest studies without control groups |
8 | Other:Anecdotal case reportsQualitative descriptions of client experiences during or after treatmentSurveys of clients about what they think helped themSurveys of practitioners about what they think is effective |
Note: Best evidence at Level 1.
Notice that we have not yet discussed the types of studies residing in the top two levels of that table. You might also notice that Level 3 contains the single term randomized experiment. What distinguishes that level from the top two levels is the issue of replication. We can have more confidence about the results of an experiment if its results are replicated in other experiments conducted by other investigators at other sites. Thus, a single randomized experiment is below multisite replications of randomized experiments on the hierarchy. This hierarchy assumes that each type of study is well designed. If not well designed, then a particular study would merit a lower level on the hierarchy. For example, a randomized experiment with egregiously biased measurement would not deserve to be at Level 3 and perhaps would be so fatally flawed as to merit dropping to the lowest level. The same applies to a quasi-experiment with a severe vulnerability to a selectivity bias.
Typically, however, replications of experiments produce inconsistent results (as do replications of studies using other designs). Moreover, replications of studies that evaluate different interventions relevant to the same EIP question can accumulate and produce a bewildering array of disparate findings as to which intervention approach is the most effective. The studies at the top level of the hierarchy – systematic reviews (SR) and meta-analyses – attempt to synthesize and develop conclusions from the diverse studies and their disparate findings. Thyer (2004) described systematic reviews (SR) as follows:
In an SR, independent and unbiased researchers carefully search for every published and unpublished report available that deals with a particular answerable question. These reports are then critically analyzed, and – whether positive or negative, whether consistent or inconsistent – all results are assessed, as are factors such as sample size and representativeness, whether the outcome measures were valid, whether the interventions were based on replicable protocols or treatment manuals, what the magnitude of observed effects were, and so forth. (p. 173)
Although systematic reviews often will include and critically analyze every study they find, not just randomized experiments, they should give more weight to randomized experiments than to less controlled studies in developing their conclusions. Some systematic reviews, such as those registered with the Campbell or Cochrane collaborations, require researchers to meet strict standards related to methods used to find studies and quality standards for the studies that will or will not be included in the review itself.
A more statistically oriented type of systematic review is called meta-analysis. Meta-analyses often include only randomized experiments, but sometimes include quasi-experimental designs and other types of studies as well. The main focus of meta-analysis is to aggregate the statistical findings of different studies that assess the effectiveness of a particular intervention. A prime aim of meta-analysis is to calculate the average strength of an intervention's effect by aggregating the effect strength reported in each individual study. Meta-analyses also can assess the statistical significance of the aggregated results. When meta-analyses include studies that vary in terms of methodological rigor, they also can assess whether the aggregated findings differ according to the quality of the methodology. The most powerful approach to a systematic review is the combination of the rigorous and transparent searching methods, clear criteria for inclusion an exclusion of selected studies, and statistical aggregation of data.
Some meta-analyses will compare different interventions that address the same problem. For example, a meta-analysis might calculate the average strength of treatment effect across experiments that evaluate the effectiveness of exposure therapy in treating PTSD, then do the same for the effectiveness of eye movement desensitization and reprocessing (EMDR) in treating PTSD, and then compare the two results as a basis for considering which treatment has a stronger impact on PTSD.
You can find some excellent sources for unbiased systematic reviews and meta-analyses in Table 2.2 in Chapter 2. Later in this book, Chapter 8 examines how to critically appraise systematic reviews and meta-analyses. Critically appraising them is important because not all of them are unbiased or of equal quality. It is important to remember that to merit a high level on the evidentiary hierarchy, an experiment, systematic review, or meta-analysis needs to be conducted in an unbiased manner. In that connection, what we said earlier about Table 3.1 is very important, and thus merits repeating here:
This hierarchy assumes that each type of study is well designed. If not well designed, then a particular study would merit a lower level on the hierarchy.
For example, a randomized experiment with egregiously biased measurement would not deserve to be at Level 3 and perhaps would be so fatally flawed as to merit dropping to the lowest level. The same applies to a quasi-experiment with a severe vulnerability to a selectivity bias.
3.3.5