Chapter 5 covers requesting, obtaining, and loading the necessary data. In this chapter, we begin to work with our source data. This chapter marks the beginning of the discussion on the creation of the analytic files we will use to summarize our results and ultimately to answer our research questions.
• Chapter 6 defines beneficiary enrollment characteristics, including the creation of variables that indicate continuous enrollment, age, and geographic information.
• Chapter 7 presents code to calculate the aforementioned measurements of utilization.
• Chapter 8 presents code to calculate the aforementioned measurements of Medicare payment.
• Chapter 9 identifies common medical conditions by focusing on diabetes and COPD, and develops examples of basic measurements of quality outcomes for beneficiaries who have these conditions.
• Chapter 10 focuses on bringing the output of Chapter 5 through Chapter 9 together, using that output for answering the research questions, and presenting those answers. We will also discuss the steps involved in finalizing our work through documentation, preservation of code, and complying with all of CMS’s Data Use Agreement6 policies, such as the destruction of data.
As you can see, another way of presenting the organization of this book is to say that the first four chapters are not focused on writing SAS code. Rather, they are focused on learning about the Medicare program, Medicare data, CMS’s systems, and the unique process of planning a research programming project that utilizes Medicare administrative data. Again, this is intentional. It is significantly important that the reader acknowledge that one must understand the Medicare program in order to successfully work with Medicare data, and one must understand Medicare data in order to properly answer research questions with Medicare data. Therefore, the first four chapters set up a foundation for the remaining chapters, with the coding and the actual execution of answering our example research questions occurring in Chapter 5 through Chapter 10.
How to Use This Book
With this in mind, each reader will come to this book with different levels of programming experience and various levels of exposure to working with Medicare administrative data. I recommend the following approach based on the reader’s experience:
• Those readers with experience in SAS, but with no background in healthcare or Medicare data could focus more on Chapter 1 through Chapter 5, where we learn about the Medicare program, how the Medicare program drives the content of the data, and the acquisition of Medicare administrative claims and enrollment data.
• Those readers with knowledge of the Medicare program, but not Medicare data or SAS could focus more on Chapter 3, where we discuss Medicare data, as well as Chapter 6 through Chapter 10, where we develop the majority of code.
For those readers using the code developed in this book, it is important to note that there will almost certainly be ways of making the code we write more efficient. I have consciously sacrificed developing code that processes efficiently (e.g., shorter CPU and wall time) for the efficiency gained by writing code that clearly steps through a process, even if it involves coding additional DATA steps. Indeed, the full set of algorithms presented in Chapter 5 through Chapter 10 can be combined into fewer steps requiring less reading and writing of large Medicare claims files, which would result in a set of code that processes faster. However, our objective is to learn about programming with Medicare data, so we are sacrificing those efficiency gains in order to develop specific algorithms in a stepwise fashion, chapter by chapter. It would be a good exercise for you to revamp the code developed in Chapter 5 through Chapter 10 in order to reduce processing time (and I’d love to see your results!).
The online companion to this book is at http://support.sas.com/publishing/authors/gillingham.html. Here, you will find information on creating dummy source data, the code presented in this book, and answers to the exercises in each chapter. I expect you to visit the book’s website, create your own dummy source data, and run the code yourself.
Disclaimer
The synthetic data used for purposes of this book originated with the Centers for Medicare & Medicaid Services’ (CMS) Data Entrepreneurs’ Synthetic Public Use File (DE-SynPUF), which is available in the public domain. While the DE-SynPUF is derived from data that is used by CMS for operational purposes, the DE-SynPUF does not permit direct identification of any individuals because all direct identifiers have been removed. The author assumes no responsibility for the accuracy, completeness, or reliability of the DE-SynPUF, and assumes no responsibility for the consequences of any use of the data or algorithms contained in this book. The data are used herein without any representation or endorsement and without warranty of any kind, whether express or implied. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, SPECIAL OR INCIDENTAL DAMAGES RESULTING FROM, ARISING OUT OF OR IN CONNECTION WITH, THE USE OF THE DATA OR ALGORITHMS CONTAINED HEREIN.
Chapter Summary
In this chapter, we introduced the purpose of the book, described the framework of the book, and specified our example research programming project.
1 The idea of understanding Medicare data prior to writing any code is so important that the first five chapters of this book are focused on laying out the framework of the book, learning about the Medicare program, Medicare data, and CMS’s systems, and the unique planning process of a research programming project that utilizes Medicare administrative data. We do not begin to write any SAS code until Chapter 5!
2 Updating the year of study requires examining the choice of descriptive codes (such as procedure codes) discussed in later chapters. The need to choose relevant codes based on year of study is very common in health services research. In Chapter 10, we present an exercise that asks the reader to contemplate the changes we would need to make to update the text as if the demonstration program we are researching took place during the year 2015.
3 We will discuss the use of this file more in Chapter 5. Specifically, this file will serve two purposes in our work. First, we will use this file as a “finder file” that serves as the basis for our data extraction. In addition, we will use this file to assign responsibility for a beneficiary’s care to a provider (called attribution). We assume that this finder file was