Ron Cody, EdD

SAS Statistics by Example


Скачать книгу

Each Value of a Categorical Variable

       Conclusions

      One of the first steps in any statistical analysis is to calculate some basic descriptive statistics on the variables of interest. SAS has a number of procedures that provide tabular as well as graphical displays of your data.

      To demonstrate some of the ways that SAS can produce descriptive statistics, use a data set called Blood_Pressure. This data set contains the variables Subj (ID value for each subject), Drug (with values of Placebo, Drug A, or Drug B), SBP (systolic blood pressure), DBP (diastolic blood pressure), and Gender (with values of M or F). Here is a listing of the first 25 observations from this data set:

Image430.png

      Notice that some of the observations contain missing values, represented by periods for numeric values and blanks for character values

      One way to compute means and standard deviations is to use PROC MEANS. Here is a program to compute some basic descriptive statistics on the two variables SBP and DBP:

libname example ’c:\books\statistics by example’; title “Descriptive Statistics for SBP and DBP”; proc means data=example.Blood_Pressure n nmiss mean std median maxdec=3; var SBP DBP; run;

      Because the Blood_Pressure data set is a permanent SAS data set (when it was created, it was placed in a folder on a disk drive instead of in a temporary SAS folder that disappears when you end your SAS session), you need a LIBNAME statement to tell SAS where to find the data set. In this example, the data set is located in the c:\books\statistics by example folder. Remember that SAS data set names contain two parts: the part before the period is a library reference (libref for short) that tells SAS where to find the data set, and the part after the period is the actual data set name (in this case, Blood_Pressure). If you were to use your operating system to list the contents of the c:\books\statistics by example folder, you would see a file called:

      Blood_Pressure.sas7bdat

      This file is the actual SAS data set and contains both the descriptor portion and the individual observations. The extension sas7bdat indicates that the data set is compatible with SAS 7 and later. This file is not a text file, and you cannot view it using a word processor or other Windows programs.

      The TITLE statement causes SAS to print the title at the top of every page of output until you change the title or turn off all titles. In this program, the title is placed in double quotes. You can also use single quotation marks (as long as there are no single quotation marks in the title) or, for that matter, no quotation marks at all (SAS is smart enough to realize that the text following a TITLE statement is the title text).

      PROC MEANS is a popular SAS procedure that produces a number of useful statistics. In this program, the keyword DATA= tells the procedure that you want to produce descriptive statistics on the Blood_Pressure data set.

      You can control what statistics this procedure produces by using procedure options. These options are placed between the procedure name and the semicolon ending the statement, and you can place them in any order. If you omit these options, PROC MEANS will, by default, print the number of nonmissing observations, the mean, standard deviation, the minimum value, and the maximum value.

      The first two options in this program, N and NMISS, cause the number of nonmissing and missing values for each variable to be reported. The next three options, MEAN, STD, and MEDIAN, request the mean, standard deviation, and the median to be computed. The last option, MAXDEC=n, specifies how many digits to the right of the decimal point you want in your report. In this program, you are requesting that all the statistics be reported to three decimal places.

      The following list describes some of the more useful options:

Option Description
n Number of nonmissing observations
nmiss Number of observations with missing values
mean Arithmetic mean
std Standard deviation
stderr Standard error
min Minimum value
max Maximum value
median Median
maxdec= Maximum number of decimal places to display
clm 95% confidence limit on the mean
cv Coefficient of variation

      The VAR statement tells the procedure which variables you want to analyze. If you omit a VAR statement, PROC MEANS produces statistics on all of the numeric variables in the specified data set (usually not a good idea).

      Finally, the PROC step ends with a RUN statement. Here is the output:

Image465.png

      The data set Blood_Pressure also contains a variable called Drug. You might want to see the same statistics, but this time compute them for each level of Drug. One way to do this is to add a CLASS statement to PROC MEANS like this:

title “Descriptive Statistics Broken Down by Drug”; proc means data=example.Blood_Pressure n nmiss mean std median maxdec=3; class Drug; var SBP DBP; run;

      The CLASS statement tells the procedure to produce the selected statistics for each unique value of Drug. This is a good time to tell you that when you have more than one statement in a PROC step (in this case, the CLASS and VAR statements), the order of these statements does not usually matter. The exceptions are certain statistical procedures in which you must specify your model before you ask for certain statistics.

      Here is the output:

Image473.png

      You should always request both the N and NMISS options when you run PROC MEANS, because missing values are a possible source of bias.

      What if you want to see the grand mean, as well as the means broken down by Drug, all in one listing? The PROC MEANS option PRINTALLTYPES does this for you when you include a CLASS statement. Here is the modified program:

title “Descriptive Statistics Broken Down by Drug”; proc means data=example.Blood_Pressure n nmiss mean std median printalltypes maxdec=3; class Drug; var SBP DBP; run;

      Here is the corresponding output:

Image482.png

      Now you see statistics for each value of Drug and for all subjects, in the same listing.