James Blum

Fundamentals of Programming in SAS


Скачать книгу

HHIncome;

      run;

      Output 1.8.1: Expected Result from Program 1.8.1 (Colors and Fonts May Differ)

Analysis Variable : HHINCOME Total household income
MortgageStatusN ObsNMeanStd DevMinimumMaximum
N/A30334230334237180.5939475.13-19998.001070000.00
No, owned free and clear30034930034953569.0863690.40-22298.001739770.00
Yes, contract to purchase9756975651068.5046069.11-7599.00834000.00
Yes, mortgaged/ deed of trust or similar debt54561554561584203.7072997.92-29997.001407000.00

      Case Study

      For additional practice, multiple case studies are available in addition to the IPUMS CPS case study used in subsequent chapters. See Section 8.1 to apply the skills from this chapter to the Clinical Trials Case Study. For additional case studies, including extensions to the IPUMS CPS case study, see the author pages.

      Chapter 2: Foundations for Analyzing Data and Reading Data from Other Sources

       2.1 Learning Objectives

       2.2 Case Study Activity

       2.3 Getting Started with Data Exploration in SAS

       2.3.1 Assigning Labels and Using SAS Formats

       2.3.2 PROC SORT and BY-Group Processing

       2.4 Using the MEANS Procedure for Quantitative Summaries

       2.4.1 Choosing Analysis Variables and Statistics in PROC MEANS

       2.4.2 Using the CLASS Statement in PROC MEANS

       2.5 User-Defined Formats

       2.5.1 The FORMAT Procedure

       2.5.2 Permanent Storage and Inspection of Defined Formats

       2.6 Subsetting with the WHERE Statement

       2.7 Using the FREQ Procedure for Categorical Summaries

       2.7.1 Choosing Analysis Variables in PROC FREQ

       2.7.2 Multi-Way Tables in PROC FREQ

       2.8 Reading Raw Data

       2.8.1 Introduction to Reading Delimited Files

       2.8.2 More with List Input

       2.8.3 Introduction to Reading Fixed-Position Data

       2.9 Details of the DATA Step Process

       2.9.1 Introduction to the Compilation and Execution Phases

       2.9.2 Building blocks of a Data Set: Input Buffers and Program Data Vectors

       2.9.3 Debugging the DATA Step

       2.10 Validation

       2.11 Wrap-Up Activity

       2.12 Chapter Notes

       2.13 Exercises

      At the conclusion of this chapter, mastery of the concepts covered in the narrative includes the ability to:

       Apply the MEANS procedure to produce a variety of quantitative summaries, potentially grouped across several categories

       Apply the FREQ procedure to produce frequency and relative frequency tables, including cross-tabulations

       Categorize data for analyses in either the MEANS or FREQ procedures using internal SAS formats or user-defined formats

       Formulate a strategy for selecting only the necessary rows when processing a SAS data set

       Apply the DATA step to read data from delimited or fixed-position raw text files

       Describe the operations carried out during the compilation and execution phases of the DATA step

       Compare and contrast the input buffer and program data vector

       Apply DATA step statements to assist in debugging

       Apply the COMPARE procedure to compare and validate a data set against a standard

      Use the concepts of this chapter to solve the problems in the wrap-up activity. Additional exercises and case-studies are also available to test these concepts.

      This section introduces a case study that is used as a basis for most of the concepts and associated activities in this book. The data comes from the Current Population Survey by the Integrated Public Use Microdata Series (IPUMS CPS). IPUMS CPS contains a wide variety of information, only a subset of the data collected from 2001-2015 is included in the examples here. Further, the data used is introduced in various segments, starting with simple sets of variables and eventually adding more information that must be assembled to achieve the objectives of each section.

      This chapter works with data that includes household-level information from the 2005 and 2010 IPUMS CPS data sets of over one million observations each. Included are variables on state, county, metropolitan area/city, household income, home value, mortgage status, ownership status, and mortgage payment. Outputs 2.2.1 through 2.2.4 show tabular summaries from the 2010 data, including quantitative statistics, frequencies, and/or percentages. Reproducing these tables in the wrap-up activity in Section 2.11 is the primary objective for this chapter.

      The first sample output shown in Output 2.2.1 produces a set of six statistics on mortgage payments across metropolitan status for mortgages of $100 per month or more. In order to make this table, and the slightly more complicated Output 2.2.2, several components of the MEANS procedure must be understood.

      Output 2.2.1: Basic Statistics on Mortgage Payments Grouped on Metropolitan Status

Analysis Variable : MortgagePayment Mortgage Payment
MetroNMeanMedianStd