difference in the outcome had every treated subject been “treated,” versus the counterfactual outcomes had every “treated” subject taken the other intervention. Notice, in a randomized experiment, ATT is equivalent to ATE.
● Compliers’ average treatment effect (CATE): In RCTs or observational studies, there is an interest in understanding the causal treatment effect for those who complied with their assigned interventions (Frangakis and Rubin 2002). Such interest generates an estimate of the CATE as described below.
Regarding the CATE, let us first consider the scenario in a randomized experiment. In an intention-to-treat analysis, we compare individuals assigned to the treatment group (but who did not necessarily receive it) with individuals assigned to the control group (some of whom might have received the treatment). This comparison is valid due to the random assignment, but it does not necessarily produce an estimate of the effect of the treatment, rather it estimates the effect of assigning or prescribing a treatment. The instrumental variables estimator in this case adds an assumption and modifies the intention-to-treat estimator to an estimator of the effect of the treatment. The key assumption is that the assignment has no causal effect on the outcome except through a causal effect on the receipt of the treatment.
In general, we can think of there being four types of individuals characterized by the response to the treatment assignment. There are individuals who always receive the treatment, regardless of their assignment, the “always-takers.” There are individuals who never receive the treatment, regardless of their assignment, the “never-takers.” For both of these subpopulations, the key assumption is that there is no effect of the assignment whatsoever. Then there are individuals who will always comply with their assignment, the “compliers.” We typically rule out the presence of the fourth group, the “defiers” who do the opposite of their assignment. We can estimate the proportion of compliers (assuming no defiers) as the share of treated among those assigned to the treatment minus the share of treated among those assigned to the control group. The instrumental variables estimator is then the ratio of the intent-to-treat effect on the outcome divided by the estimated share of compliers. This has the interpretation of the average effect of the receipt of the treatment on the outcome for the subpopulation of the compliers, referred to as the “local average treatment effect” or the complier average treatment effect.
Beyond the setting of a completely randomized experiment with non-compliance where the assignment is the instrument, these methods can also be used in observational settings. For example, ease of access to medical services as measured by distance to medical facilities that provide such services has been used as an instrument for the effect of those services on health outcomes.
Note that in these descriptions – while commonly used in the comparative effectiveness literature – do not fully define the estimand, as they do not address the intercurrent event. However, it is possible to use the strategy proposed in the addendum to define estimand in observational studies when intercurrent events exist. For instance, we could define the hypothetical average treatment effect as the difference between the two counterfacturals assuming everybody takes treatment A versus everybody takes treatment B without intercurrent event.
2.5 Totality of Evidence: Replication, Exploratory, and Sensitivity Analyses
As briefly mentioned at the beginning of this chapter, it is a legitimate debate whether causation can be ascertained from empirical observations. The literature includes multiple examples of claims from observational studies that have been found not to be causal relationships (Ionetta 2005, Ryan et al. 2012, Hempkins et al. 2016 – though some have been refuted – Franklin et al. 2017). Unfortunately, unless we have a well-designed and executed randomized experiment where other possible causal interpretations can be ruled out, it is difficult to fully ensure that a causal interpretation is valid. Therefore, even after a comparative observational study using appropriate bias control analytical methods, it is natural to raise the following questions. “Can we believe the causation assessed from a single observational study? How much confidence should we place on the estimated causal effect? Is there any hidden bias not controlled for? Are there any critical assumptions that are violated?” Several of the guidance documents in Table 1.2 provide a structured high-level approach to understanding the quality level from a particular study and thus start to address these questions. Grimes and Schulz (2002) also summarized questions to ask to assess the validity of a causal finding from observational research including the temporal sequence, strength and consistency of the association, biological gradient and plausibility, and coherence with existing knowledge. To expand on these ideas, we introduce the concept of totality of evidence, which represents the strength of evidence that we used to make an opinion about causation. The totality of evidence should include the following elements:
● Replicability
● Implications from exploratory analysis
● Sensitivity analysis on the critical assumptions
First, let us discuss replicability. Figure 2.3 summarizes the well-accepted evidence paradigm in health care research.
Figure 2.3: Hierarchy of Evidence
Evidence generated from multiple RCTs is atop of the paradigm, followed by the single RCTs (Sackett et al. 1996, Masic et al. 2008). Similarly, for non-randomized studies, if we were able to conduct several studies for the same research question, for example, replicate the same study on different databases, then the evidence from all of those studies would be considered stronger than the evidence from any single observational study, as long as they were all reasonably designed and properly analyzed. Here is why. Assume the “false positive” chance of observing a causal effect in any study is 5%, and we only make the causal claim if all studies reflect a causal effect. If we have two studies, then the chance that both studies are “false positive” would be 5%*5%=0.25% (1 in 400). However, with a single study, the chance of false positive causal claim is 1 in 20. Thus, replication is an important component when justifying causal relationship.
However, as Vandenbroucke (2008) points out, proper replication in observational research is more challenging than for RCTs as challenges to conclusions from observational research are typically due to potential uncontrolled bias and not chance. For example, Zhang et al. (2016) described the setting of comparative research on osteoporosis treatments from claims data that was lacking bone mineral density values (an unmeasured confounder). Simply replicating this work in the same type of database with the same unmeasured confounder would not remove the concern with bias. Thus, replication that not only addresses the potential for chance findings but those involving different data or with different assumptions might be required.
The second element is implications from exploratory analysis and we will borrow the following example from Cochran (1972) for demonstration purposes.
For causes of death for which smoking is thought to be a leading contributor, we can compare death rates for nonsmokers and for smokers of different amounts, for ex-smokers who have stopped for different lengths of time but used to smoke the same amount, for ex-smokers who have stopped for the same length of time but used to smoke different amounts, and for smokers of filter and nonfilter cigarettes. We can do this separately for men and women and also for causes of death to which, for physiological reasons, smoking should not be a contributor. In each comparison the direction of the difference in death rates and a very rough guess at the relative size can be made from a causal hypothesis and can be put to the test.
Different from replicability, this approach follows the idea of “proof by contradiction.” That is, assuming there is causal relationship between the intervention and the outcome, what would be the possible consequences? If those consequences were not observed, then a causal relationship is questionable.
Lastly, each causal framework is based on assumptions. Therefore, the importance of sensitivity analysis should never be underestimated. The magnitude of bias induced by violating certain assumptions should be quantitatively assessed. For example, the Rosenbaum-Rubin sensitivity analysis (Rubin and Rosenbaum, 1983, JRSSB) was proposed to quantify the impact of a potential unmeasured confounder, though the idea could trace back to Cornfield et al. (1959). Sensitivity