Uwe Siebert

Real World Health Care Data Analysis


Скачать книгу

missed paid work to help your care in last 12 monthsUnPdCaregiverHave you used an unpaid caregiver in last 12 monthsPdCaregiverHave you hired a caregiver in last 12 monthsDisabilityHave you received disability income in last 12 monthsSymDurDuration (in years) of symptomsDxDurTime (in years) since initial DxTrtDurTime (in years) since initial TrtmntSatisfCare_BSatisfaction with Overall Fibro Treatment over past monthBPIPain_BBPI Pain score at BaselineBPIInterf_BBPI Interference score at BaselinePHQ8_BPHQ8 total score at BaselinePhysicalSymp_BPHQ 15 total score at BaselineFIQ_BFIQ Total Score at BaselineGAD7_BGAD7 total score at BaselineMFIpf_BMFI Physical Fatigue at BaselineMFImf_BMFI Mental Fatigue at BaselineCPFQ_BCPFQ Total Score at BaselineISIX_BISIX total score at BaselineSDS_BSDS total score at BaselineBPIPain_LOCFBPI Pain score LOCFBPIInterf_LOCFBPI Interference score LOCF

      The objective in simulating a new PCI data set from the observational data was primarily to produce a larger data set allowing us to more effectively illustrate the unsupervised, nonparametric Local Control alternative to conventional propensity score stratification (Chapter 7) and machine learning methods (Chapter 15). Starting from the observational data on 996 patients who received their initial PCI at Ohio Heart Health, Lindner Center, Christ Hospital, Cincinnati (Kereiakes et al, 2000), we generated this much larger data set via plasmode simulation. The simulated data set contains 11 variables on 15,487 patients with no missing values and is referred to as the PCI15K simulated data set. The key variables in the data set are described in Table 3.6. The treatment cohort for later analyses is represented by the variable THIN and the outcomes by SURV6MO (binary) and CARDCOST (continuous). As details of a process for generating simulated data was described for the REFLECTIONS example, only a brief summary and listing of the final simulated dataset variables are provided for the PCK15K dataset.

      Table 3.6: PCI Simulated Data Set Variables

Variable NameVariable Label
patidPatient ID number: 1 to 15487
surv6moBinary PCI Survival variable: 1 => survival for at least six months following PCI, 0 => survival for less than six months
cardcostCardiac related costs incurred within six months of patient’s initial PCI; numerical values in 1998 dollars; costs were truncated by death for the 404 patients with surv6mo = 0
thinNumeric treatment selection indicator: thin = 0 implies usual PCI care alone; thin = 1 implies usual PCI care augmented by either planned or rescue treatment with the new blood thinning agent
stentCoronary stent deployment; numeric, with 1 meaning YES and 0 meaning NO
heightHeight in centimeters; numeric integer from 133 to 198
femaleFemale gender; numeric, with 1 meaning YES and 0 meaning NO
diabeticDiabetes mellitus diagnosis; numeric, with 1 meaning YES and 0 meaning NO
acutemiAcute myocardial infarction within the previous 7 days; numeric, with 1 meaning YES and 0 meaning NO
ejfractLeft ejection fraction; numeric value from 17 percent to 77 percent
ves1procNumber of vessels involved in the patient’s initial PCI procedure; numeric integer from 0 to 5

      Tables 3.7 and 3.8 summarize the outcome data from the original data and the simulated Lindner data. Data are similar with slightly narrower group differences in the simulated data. In Chapters 7, 14, and 15, the PCI simulated data set is used for analysis and is named PCI15K.

      Table 3.7: Lindner STUDY (Kereiakes et al. 2000)

PatientsNumber Surviving Six MonthsPercent Surviving Six MonthsAverage Cardiac Related Cost
Trtm = 029828394.97%$14,614
Trtm = 169868798.42%$16,127

      Table 3.8: PCI Blood Thinner Simulation

PatientsNumber Surviving Six MonthsPercent Surviving Six MonthsAverage Cardiac Related Cost
Thin = 08476815896.25%$15,343
Thin = 17011692598.77%$15,643

      In this chapter, two observational studies were introduced: the REFLECTIONS one-year study of patients with fibromyalgia and the Lindner study of patients undergoing PCI. The concept of plasmode simulations, where one builds a simulated data set that retains the same variables and correlation structure as the original data, was introduced and applied to the REFLECTIONS and Lindner data sets. SAS IML code for the application to the REFLECTIONS data was provided and was demonstrated to retain the similarities of the original data. These two data sets (simulated REFLECTIONS and PCI15K) are used throughout the remainder of the book to demonstrate the various methods for real world data analyses demonstrated in each chapter.

      Austin P (2008). Goodness-of-fit Diagnostics for the Propensity Score Model When Estimating Treatment Effects Using Covariate Adjustment With the Propensity Score. Pharmacoepi & Drug Safety 17: 1202-1217.

      Conover WG and Iman RL (1976). Rank Transformations in Discriminant Analysis.

      Franklin JM, Schneeweis S, Polinski JM, Rassen J (2014). Plasmode simulation for the evaluation of pharacoepidemiologic methods in complex healthcare databases. Comput Stat Data Anal 72:219-226.

      Gadbury GL, Xiang Q, Yang L, Barnes S, Page GP, Allison DB (2008). Evaluating Statistical Methods Using Plasmode Data Sets in the Age of Massive Public Databases: An Illustration Using False Discovery Rates. PLoS Genet 4(6): e1000098.

      Kereiakes DJ, Obenchain RL, Barber BL, Smith A, McDonald M, Broderick TM, Runyon JP, Shimshak TM, Schneider JF, Hattemer CH, Roth EM, Whang DD, Cocks DL, Abbottsmith CW (2000). Abciximab provides cost effective survival advantage in high volume interventional practice. American Heart J 140: 603-610.

      Peng X, Robinson RL, Mease P, Kroenke K, Williams DA, Chen Y, Faries D, Wohlreich M, McCarberg B, Hann D (2015). Long-Term Evaluation of Opioid Treatment in Fibromyalgia. Clin J Pain 31: 7-13.

      Robinson RL, Kroenke K, Mease P, Williams DA, Chen Y, D’Souza D, Wohlreich M, McCarberg B (2012). Burden of Illness and Treatment Patterns for Patients with Fibromyalgia. Pain Medicine 13:1366-1376.

      Wicklin R (2013). Simulating Data with SAS®. Cary, NC: SAS Institute Inc.

      Chapter 4: The Propensity Score

       4.1 Introduction

       4.2 Estimate Propensity Score

       4.2.1 Selection of Covariates

       4.2.2 Address Missing Covariates Values in Estimating Propensity Score

       4.2.3 Selection of Propensity Score Estimation Model

       A Priori Logistic Regression Model

       Automatic Parametric Model Selection

       Nonparametric Models

       4.2.4