Uwe Siebert

Real World Health Care Data Analysis


Скачать книгу

of the control group and its difference with the treatment group for the th stratum (), further calculate the weighted sum of the mean of the th covariate for the treatment and control groups. An F statistic can then be constructed to test the null hypothesis that the mean for the treated subpopulation is identical to that of the mean of the control subpopulation in the th stratum.

      4. Assess the balance within each stratum for each covariate. Similar to the first step, but construct the statistic for covariates and J strata. Therefore, a total test statistic value will be generated, and it will be useful to present Q-Q plots, comparing with those values. If the covariates are well-balanced, we would expect the Q-Q plots to be flatter than a 45⁰ line.

      In general, it is not clear how to find the “best” set of propensity score estimates for a real world study. In some cases, the balance assessments might show one model to be clearly better than another. However, in other cases, some models may balance better for some covariates and less well for others (and not all covariates are equally important in controlling bias). As a general rule, as long as the estimated propensity scores induce reasonable balance between the treated and control group, it can be considered as a “good” estimate, and the researchers should be able to use the estimated propensity score to control the bias caused by the confounding covariates in estimating causal treatment effect. In Chapter 5, we will discuss those statistics as quality check of propensity score estimates in more detail.

      In the REFLECTIONS study described in Chapter 3, the researchers were interested in comparing the BPI pain score at one year after intervention initiation between patients initiating opioids versus those who initiated all other (non-opioid) interventions. Based on the DAG assessment presented earlier, the following covariates were considered as important confounders: age, gender, race, body mass index (BMI), doctor specialty, and baseline scores for pain severity (BPI-S), pain interference (BPI-I), disease impact (FIQ), disability score (SDS), depression severity (PHQ-8), physical symptoms (PHQ-15), anxiety severity (GAD-7), insomnia severity (ISI), cognitive functioning (MGH-CPFQ), and time since initial therapy (DxDur). The propensity score is the probability of receiving opioid given these covariates. For covariates with missing values, after initial assessment, only the variable DxDur had missing values and the number of subjects with missing DxDur value is over 100 (133 out of 1000). For demonstration purposes in this chapter, we will only use MI to impute the missing DxDur for propensity score estimation. Readers are able to implement MP and MIMP methods for missing data imputation using the SAS code presented earlier in this chapter. The sections demonstrate the a priori, automated model building, and gradient boosting approaches to estimating the propensity scores. Only histograms of the propensity scores are presented, and a full evaluation of the quality of the models is withheld until Chapter 5.

      First, a logistic regression model with the previously described covariates as main effects was constructed to estimate propensity score. No interactions or other high order terms were added to the model since there is no strong clinical evidence suggesting the existence of those terms. Program 14.1 implements an a priori model such as this. The estimated propensity score distribution between opioid and non-opioid group is shown in Figure 4.2. Note, code for this mirrored histogram plot is presented in Chapter 5.

      Figure 4.2: The Distribution of Estimated Propensity Score Using an a Priori Logistic Model

      From the histogram of the distributions, we can see that the opioid group has higher estimated propensity scores compared with those of non-opioid group. It is not surprising because the propensity score estimated is the probability of receiving opioid, so in theory the opioid group should have higher probability of receiving the opioid as long as there are factors related to treatment selection in the model. For the opioid group, there are very few subjects with very little chance of receiving opioid (propensity score < 0.1). For the non-opioid group, quite a few subjects had very little chance of receiving opioid, which skewed the distribution of estimated propensity score to 0, that is, less likely to receive an opioid.

      Second, an automatic logistic model selection approach was implemented. (See Program 4.10.) In addition to the main effect, interactions were added to the model iteratively if the added one was able to reduce the number of total imbalanced strata. In this example, we select five strata and determine a covariate is imbalanced if the standardized difference is more than 0.25 on 2 or more strata. An interaction term was added if it was still imbalanced at current iteration. The iterative process will stop if the added interactions cam no longer reduce the number of total imbalanced strata. The estimated propensity score distribution between opioid and non-opioid group is shown in Figure 4.3.

      Figure 4.3: The Distribution of Estimated Propensity Score Using Automatic Logistic Model Selection

      The automatic logistic model selection resulted in similar distributions of the estimated propensity scores compared with the ones generated by the a priori logistic model, although the number of subjects who had very low propensity score estimates (< 0.1) in the non-opioid group increased a little bit. The output data from Program 4.8 provides all the details about this automatic model selection procedure (not shown here). In our case, six interactions are included in the final propensity score estimation model, and the model was able to reduce the number of imbalanced interaction terms from 48 to 34.

      Lastly, a boosted CART propensity score model was constructed with PROC GRADBOOST. (See Program 4.9.) The cross validated (5 folds) tuning of the hyper-parameters was done using genetic algorithm (population size=30) with the misclassification error as the objective function. The early stopping rule has been applied in order to stop model fitting if there is no improvement on objective function in 10 iterations. The missing data are handled by default, so no imputation is needed.

      The gradient boosting model resulted in similar distributions of the estimated propensity scores compared with the ones generated by a priori logistic model or automatic selected logistic model, as shown in

      Figure 4.4.

      Figure 4.4: The Distribution of Estimated Propensity Score Using Gradient Boosting

      This chapter introduced the propensity score, a commonly used method when estimating causal treatment effects in non-randomized studies. This included a brief presentation of its theoretical properties to explain why the use of propensity scores is able to reduce bias in causal treatment effect estimates. Key assumptions of propensity scores were provided so that researchers can better evaluate the validity of the analysis results if propensity score methods were used. If some assumptions were violated, sensitivity analysis should be considered to assess the impact of such violation. Later in the book in Chapter 13, we will discuss the existence of unmeasured confounding and the appropriate approach to address this issue.

      The main focus of the chapter was providing guidance and SAS code for implementation for estimating propensity scores – the true propensity score of a subject is usually unknown in observational research. Key steps covered in the discussion included: (1) selection of covariates included in the model, (2) addressing missing covariates values, (3) selection of an appropriate modeling approach, and (4) assessment of quality of the estimated propensity score. For each element, possible approaches were discussed and recommendations made. We also provided SAS code to implement the best practices. We applied selected methods to estimate propensity scores of the intervention groups using the simulated real world REFLECTIONS data. The propensity score estimates will be further used to control for confounding bias in estimating the causal treatment effect between opioid and non-opioid groups via matching (Chapter 6), stratification (Chapter 7), and weighting (Chapter 8).