Uwe Siebert

Real World Health Care Data Analysis


Скачать книгу

Tipton’s Index

      Tipton (2014) proposed an index comparing the similarity of two cohorts as part of work in the generalizability literature to assess how well re-weighting methods are able to generalize results from one population to another. Tipton showed that, under certain conditions, her index is a combination of the standardized difference and ratio of variance statistics. Thus, the Tipton index improves on using only the standardized difference by detecting differences in scale between the distributions as well. The Tipton Index (TI) is calculated by the following formula applied to the distributions of the propensity scores for each treatment group:

      where for strata j = 1 to k, is the proportion of the Treatment A patients that are in stratum j () and is the proportion of Treatment B patients in stratum j () and A recommended stratum size for calculating the index is based on the total sample size: . The index takes on values from 0 to 1, with very high values indicating good overlap between the distributions. As a rule of thumb, an index score > 0.90 is roughly similar to the combination of a standardized mean difference < 0.25 and a ratio of variances between 0.5 and 2.0.

      Imbens and Rubin (2015) propose a pair of summary measures based on individual patient differences to assess whether the overlap in baseline patient characteristics between treatments is sufficient to allow for statistical adjustment. The two proposed measures are the proportion of subjects in Treatment A having at least one similar matching subject in treatment B and the proportion of subjects in Treatment B having at least one similar match in Treatment A. A subject is said to have a similar match if there is a subject in the other treatment group with a linearized propensity score value within 0.1 of that subject’s linearized propensity score. The linearized propensity score (lps) is defined as where ps is the propensity score for the patient given their baseline covariates. Note that this statistic is most relevant when matching with replacement is used for the analytical method.

      Imbens and Rubin (2015) propose a pair of summary measures based on individual patient differences to assess whether the overlap in baseline patient characteristics between treatments is sufficient to allow for statistical adjustment. The two proposed measures are the proportion of subjects in Treatment A having at least one similar matching subject in treatment B and the proportion of subjects in Treatment B having at least one similar match in Treatment A. A subject is said to have a similar match if there is a subject in the other treatment group with a linearized propensity score value within 0.1 of that subject’s linearized propensity score. The linearized propensity score (lps) is defined as where ps is the propensity score for the patient given their baseline covariates. Note that this statistic is most relevant when matching with replacement is used for the analytical method.

      Patients in the tails of the propensity score distributions are often trimmed, or removed, from the analysis data set. One reason is to ensure the positivity assumption that each patient has a probability of being assigned to either treatment of greater than 0 and less than 1 is satisfied. This is one of the key assumptions for causal inference when using observational data. Secondly, when weighting-based analyses are performed, patients in the tails of the propensity distributions can have extremely large weights. This can result in inflation of the variance and reliance of the results on a handful of patients.

      While many ad hoc approaches exist, Crump et al. (2009) and Baser (2007) proposed and evaluated a systematic approach for trimming to produce an analysis population. This approach balances the increase in variance due to reduced sample size (after trimming) versus the decrease in variance from removing patients lacking matches in the opposite treatment (and thus have large weights in an adjusted analysis). Specifically, the algorithms find the subset of patients with propensity scores between α and 1-α that minimizes the variance of the estimated treatment effects. Crump et al. (2009) state that for many scenarios the simple rule of trimming to an analysis data set including all estimated propensity scores between 0.1 and 0.9 is near optimal.

      However, in some scenarios, the sample size is large and efficiency in the analysis is of less concern than excluding patients from the analysis. In keeping with the positivity assumption (see Chapter 2), a commonly used approach is to trim only (1) the Treatment A (treated) patients with propensity scores above the maximum propensity score in the Treatment B (control) group; and (2) the Treatment B patients with propensity scores below the minimum propensity score in the Treatment A group.

      The PSMATCH procedure in SAS can easily implement the Crump rule of thumb and the min-max procedure and other variations using the Region statement (Crump: REGION=ALLOBS (PSMIN=0.1 PSMAX=0.9); min-max: REGION=CS(extend=0)). We fully implement the Crump algorithm in Chapter 10 in the scenarios with more than two treatment groups where it is difficult to visually assess the overlap in the distributions. In this chapter, we follow the approaches available in the PSMATCH procedure.

      Recently, Li et al. (2016) proposed the concept of overlap weights to limit the need to trim the population simply to avoid large weights in the analysis. They propose an alternative to the target population of inference in addition to ATE and ATT. Specifically, the overlap weights up-weight patients in the center of the combined propensity distributions and down weight patients in the tails. This is discussed in more detail in Chapter 8, but mentioned here to emphasize the point that the need for trimming depends on the target population and planned analysis method (for example, matching with calipers will trim the population by definition). At a minimum, in keeping with the importance of the positivity assumption, we recommend trimming using the minimum/maximum method available in PSMATCH.

      Once it has been determined that it is feasible to perform a comparative analysis, one last critical step in the design stage of the research is to confirm the success of the statistical adjustment (for example, propensity score) for measured confounders. The success of a propensity score model is judged by the degree to which it results in the measured covariates to be balanced between the treatment groups. Austin (2009) argues that comparing the balance between treatment groups for each and every potential confounder after the propensity adjustment is the best approach and that assessment of the propensity distributions alone is informative but not sufficient for this step. Also, a good statistic for this “balance” assessment should be both independent of sample size and a function of the sample (Ho et al. 2007). Thus, the common practice of comparing baseline characteristics using hypothesis tests, which are highly dependent on sample size, is not recommended (Austin 2009). For these reasons, computing the standardized differences for each covariate has become the gold standard approach to assessing the balance produced by the propensity score. However, simply demonstrating similar means for two distributions does not imply similar distributions. Thus, further steps providing a fuller understanding of the comparability of the distributions of the covariates between treatments is recommended. In addition, ensuring similar distributions in each treatment group for each covariate does not ensure that interactions between covariates are the same in each treatment group.

      We follow a modified version of Austin’s (2009) recommendations as our best practice for balance assessment. For each potential confounder:

      1. Compute the absolute standardized differences of the mean and the variance ratio.

      2. Compare the absolute standardized differences of