Obs
Choosing Serial as a class variable results in each class being a single observation, making the mean, minimum, and maximum the same value and creating a situation where the standard deviation is undefined. Again, this would be an extreme case; however, class variables are best when structured to produce relatively few classes that represent a useful stratification of the data.
Of course, more than one variable can be used in a CLASS statement; the categories are then defined as all combinations of the categories from the individual variables. The order of the variables listed in the CLASS statement only alters the nesting order of the levels; therefore, the same information is produced in a different row order in the table. Consider the two MEANS procedures in Program 2.4.8.
Program 2.4.8: Using Multiple Class Variables and Effects of Order
proc means data=BookData.IPUMS2005Basic nonobs n mean std;
class MortgageStatus Metro;
var HHIncome;
run;
proc means data=BookData.IPUMS2005Basic nonobs n mean std;
class Metro MortgageStatus;
var HHIncome;
run;
Output 2.4.8A: Using Multiple Class Variables (Partial Listing)
Analysis Variable : HHIncome | ||||
MortgageStatus | METRO | N | Mean | Std Dev |
N/A | 0 | 19009 | 31672.81 | 32122.89 |
1 | 48618 | 29122.73 | 29160.23 | |
2 | 69201 | 38749.69 | 46226.50 | |
3 | 73234 | 43325.25 | 42072.78 | |
4 | 93280 | 36514.56 | 36974.63 | |
No, owned free and clear | 0 | 30370 | 46533.14 | 50232.50 |
1 | 85696 | 42541.06 | 44664.64 | |
2 | 27286 | 60011.10 | 76580.75 | |
3 | 76727 | 63925.99 | 75404.62 | |
4 | 80270 | 55915.02 | 66293.39 |
Output 2.4.8B: Effects of Order (Partial Listing)
Analysis Variable : HHIncome | ||||
METRO | MortgageStatus | N | Mean | Std Dev |
0 | N/A | 19009 | 31672.81 | 32122.89 |
No, owned free and clear | 30370 | 46533.14 | 50232.50 | |
Yes, contract to purchase | 1030 | 46069.26 | 36225.80 | |
Yes, mortgaged/ deed of trust or similar debt | 41619 | 71611.01 | 55966.31 | |
1 | N/A | 48618 | 29122.73 | 29160.23 |
No, owned free and clear | 85696 | 42541.06 | 44664.64 | |
Yes, contract to purchase | 3034 | 42394.12 | 35590.14 | |
Yes, mortgaged/ deed of trust or similar debt | 93427 | 62656.54 | 48808.66 |
The same statistics are present in both tables, but the primary ordering is on MortgageStatus in Output 2.4.8A as opposed to metropolitan status (Metro) in Output 2.4.8B. Two additional items of note from this example: first, note the use of NONOBS in each. By default, using a CLASS statement always produces a column for the number of observations in each class level (NOBS), and this may be different from the statistic N due to missing data, but that is not an issue for this example. Second, the numeric values of Metro really have no clear meaning. Titles and footnotes, as shown in Chapter 1, are available to add information about the meaning of these numeric values. However, a better solution is to build a format and apply it to that variable, a concept covered in the next section.
2.5 User-Defined Formats
As seen in Section 2.3, SAS provides a variety of formats for altering the display of data values. It is also possible to define formats using the FORMAT procedure. These formats are used to assign replacements for individual data values or for groups or ranges of data, and they may be permanently stored in a library for subsequent use. Formats, both native SAS formats and user-defined formats, are an invaluable tool that are used in a variety of contexts throughout this book.
2.5.1 The FORMAT Procedure
The FORMAT procedure provides the ability to create custom formats, both for character and numeric variables. The principal tool used in writing formats is the VALUE statement, which defines the name of the format and its rules for converting data values to formatted values. Program 2.5.1 gives an example of a format written to improve the display of the Metro variable from the BookData.IPUMS2005Basic data set.
Program 2.5.1: Defining a Format for the Metro Variable
proc format;
value Metro
0 = “Not Identifiable”
1 = “Not in Metro Area”
2 = “Metro, Inside City”
3 = “Metro, Outside City”
4 = “Metro, City Status Unknown”
;
run;
The VALUE statement tends to be rather long given the number of items it defines. Remember, SAS code is generally free-form outside of required spaces and delimiters, along with the semicolon that ends every statement. Adopt a sound strategy for using indentation and line breaks to make code readable.
The VALUE statement requires the format name, which follows the SAS naming conventions of up to 32 characters, but with some special restrictions. Format names must meet an additional restriction of being distinct from the names of any formats supplied by SAS. Also, given that numbers are used to define format widths, a number at the end of a format name would create an ambiguity in setting lengths; therefore, format names cannot end with a number. If the format is for character values, the name must begin with $, and that character counts toward the 32-character limit.
In this format, individual values are set equal to their replacements (as literals) for all values intended to be formatted. Values other than 0, 1, 2, 3, and 4 may not appear as intended. For a discussion of displaying values other than those that appear in the VALUE statement, see Chapter Note 4 in Section 2.12.
The semicolon that ends the value statement is set out on its own line here for readability—simply to make it easy to verify that it is present.
Submitting Program 2.5.1 makes a format named Metro in the format catalog in the Work library, it only takes effect when used, and it is used in effectively the same manner as a format supplied by SAS. Program 2.5.2 uses the Metro format for the class variable Metro to alter the appearance of its values in Output 2.5.2. Note that since the variable Metro and the format Metro have the same name, and since no width is required, the only syntax element that distinguishes these to the SAS compiler is the required dot (.) in the format name.
Program 2.5.2: Using the Metro Format
proc means data=BookData.IPUMS2005Basic nonobs maxdec=0;
class Metro;
var HHIncome;
format Metro Metro.;
run;
Output 2.5.2: Using the Metro Format
Analysis Variable : HHIncome | |||||
METRO | N | Mean | Std Dev | Minimum | Maximum |
Not Identifiable | 92028 | 54800 | 52333 | -19998 | 1076000 |
Not in Metro Area | 230775 | 47856 | 45547 | -29997 | 1050000 |
Metro, Inside City | 154368 | 60328 | 70874 | -19998 | 1391000 |
Metro, Outside City | 340982 | 77648 | 75907 | -29997 | 1739770 |
Metro,
|