variables can increase the bias in estimating the causal treatment effect rather than reduce it. Rubin (2001) suggested that if a covariate is neither associated with the treatment selection nor the outcome, then it should not be included in the models for propensity score estimation. Notice that the candidate covariates must be measured prior to intervention initiation to ensure they were not influenced by the interventions.
In general, there are three sets of covariates that we can consider for inclusion in the estimation model:
a. Covariates that are predictive of treatment assignment
b. Covariates that are associated with the outcome variable
c. Covariates that are predictive of both treatment assignment and the outcome
Given that only variables in category c are true confounders, it might be assumed that we should follow option c for selecting variables for the estimation model. Brookhart et al. (2006) conducted simulation studies to evaluate which variables to include and their results suggested c is the “optimal” one among the three choices. However, in a letter responding to their publication (Shrier et al. 2007), the authors argue that including covariates in categories b and c has advantages. For instance, if a variable is not really a true confounder (but is strongly associated with outcome, for example, is in category b), the random imbalance seen in that variable will result in bias that could have been addressed by including the variable in the propensity score. In real data analysis, identifying which covariate belongs to category b or c can be difficult, unless researchers were accompanied with some prior knowledge on the relationship between covariates and interventions as well as between covariates and outcomes.
Directed acyclic graphs (DAGs), introduced in Chapter 2, can be a useful tool to guide the selection of covariates because a causal diagram is able to identify covariates that are prognostically important or that confound the treatment-outcome relationship (Austin and Stuart, 2015). A DAG is a graph whose nodes (vertices) are random variables with directed edges (arrows) and no directed cycles. The nodes in a DAG correspond to random variables and the edges represent the relationships between random variables, and an arrow from node A to node B can be interpreted as a direct causal effect of A on B (relative to other variables on the graph). DAGs help identify covariates one should adjust for (for example, in categories b or c above) and covariates that should NOT be included (for example, collider, covariates on causal pathway).
Figure 4.1 is a DAG created based on the simulated REFLECTIONS data analyses that will be conducted in Chapters 6 through 10. In these analyses, the interest was in estimating the causal effect of initiating opioid treatment (relative to initiating other treatments) on the change or the endpoint score in Brief Pain Inventory (BPI) pain severity scores from point of treatment initiation to one year following initiation. The covariates here were grouped into those that influence treatment selection only (insurance, region, income) and those that are confounders (influence treatment selection and outcome). Based on this DAG, the propensity score models in Section 4.3 contain all 15 of the confounding covariates (those influencing both treatment selection and the pain outcome measure).
Figure 4.1: DAG for Simulated REFLECTIONS Data Analysis
For developing a DAG, the choice of covariates should be based on expert opinion and prior research. In theory, one could use the outcome data in the current study to confirm any associations between outcome and the pre-baseline covariates. However, we suggest following the idea of “outcome-free design” in conducting observational studies, which means the researchers should avoid using any outcome data before finalizing the study design, including all analytic models.
There are other proposals in the literature for selecting covariates in estimating propensity score and we list them here for reference purposes. Rosenbaum (2002) proposed a selection method based on the significance level of difference of covariates between the two groups. He suggested including all baseline covariates on which group differences meet a low threshold for significance (for example, |t| > 1.5) in the propensity score estimation models. Imbens and Rubin (2015) developed an iterative approach of identifying covariates for the estimation model. First, covariates believed to be associated with intervention assignments according to experts’ opinion or prior evidence will be included. Second, regression models will be built separately between the intervention indicator and each of the remaining covariates. If the value of the likelihood estimate of the model exceeds a pre-specified value, then that covariate will be included.
In applied research, it may also be important to consider the impact of temporal effect in the estimation model. For instance, in a study comparing the effect of an older intervention with that of a newer intervention, subjects who entered the study in an earlier period might be more likely to receive the older intervention, whereas subjects who entered the study in a later period might be more likely to receive the newer intervention. Similarly, when a drug is first introduced on the market, physicians only try the new medication in patients who have exhausted other treatment options and then gradually introduce to a broader population later. In these cases, time does influence the intervention assignment and should be considered for the propensity model. In epidemiological research, this situation is called “channeling bias” (Petri and Urquhart 1991) and calendar time-specific propensity score methods (Mack et al. 2013, Dusetzina et al. 2013) were proposed to incorporate temporal period influence on the intervention assignment.
Hansen (2008) took a different approach than propensity score to improve the quality of causal inference in non-randomized studies by introducing the use of the prognostic score. Unlike propensity scores, whose purpose is to replicate the intervention assignment generating process, prognostic scores aim to replicate the outcome generation process. While the propensity score is a single measure of the covariates’ influence on the probability of treatment assignment, the prognostic score is based on a model of covariates’ influence on the outcome variable. Thus, to estimate the prognostic score, the model will include covariates that are highly predictive of the outcome.
The greatest strength of the propensity score is to help separating the design and analysis stages, but it is not without limitations. A recent study suggested that failure to include in the propensity score model a variable that is highly predictive of the outcome but not associated with treatment status can lead to increased bias and decreased precision in treatment effect estimates in some settings. To date, the use of prognostic score or the combination of propensity score and prognostic score still receives only limited attention. Leacy and Stuart (2014) conducted simulation studies to compare the combination use of propensity score and prognostic score versus single use of either score for matching and stratification-based analyses. Their simulation results suggested the combination use exhibited strong-to-superior performance in terms of root mean square error across all simulation settings and scenarios. Furthermore, they found “[m]ethods combining propensity and prognostic scores were no less robust to model misspecification than single-score methods even when both score models were incorrectly specified.”
Recently, Nguyen and Debray (2019) extended the use of prognostic scores with multiple intervention comparison and propose estimators for different estimands of interest and empirically verified their validity through a series of simulations. While not directly addressed further in this book, the use of prognostic scores is of potential value, and research is needed to further evaluate and provide best practices on the use of prognostic scores for causal inference in applied settings.
4.2.2 Address Missing Covariates Values in Estimating Propensity Score
In large, real world health care databases such as administrative claims databases, missing covariates values are not uncommon. As the propensity score of a subject is the conditional probability of treatment given all observed covariates, missing data for any covariates can make the propensity score estimation more challenging. To address this issue, the following methods can be considered.
The first and the simplest approach is to use only the observations without missing covariates values. This is called the complete case (CC) method. Clearly, ignoring patients with at least one missing covariate value is not a viable strategy with even moderate levels of missing data. Information from patients