outstanding mortgage.
Program 2.6.1: Conditioning on a Variable Not Used in the Analysis
proc means data=BookData.IPUMS2005Basic nonobs maxdec=0;
class Metro;
var HHIncome;
format Metro Metro.;
where Metro in (2,3,4)
and
MortgageStatus in
(‘Yes, contract to purchase’,
‘Yes, mortgaged/ deed of trust or similar debt’);
run;
Output 2.6.1: Conditioning on a Variable Not Used in the Analysis
Analysis Variable : HHIncome | |||||
METRO | N | Mean | Std Dev | Minimum | Maximum |
Metro, Inside City | 57881 | 86277 | 82749 | -19998 | 1361000 |
Metro, Outside City | 191021 | 96319 | 80292 | -29997 | 1266000 |
Metro, City Status Unknown | 167359 | 83879 | 72010 | -19998 | 1407000 |
The condition on the MortgageStatus variable is a bit daunting, particularly noting that matching character values is a precise operation. Seemingly simple differences like casing or spacing lead to values that are non-matching. Therefore, the literals used in Program 2.6.1 are specified to be an exact match for the data. In Section 3.9, functions are introduced that are useful in creating consistency among character values, along with others that allow for extraction and use of relevant portions of a string. However, the WHERE statement provides some special operators, shown in Table 2.6.4, that allow for simplification in these types of cases without the need to intervene with a function.
Table 2.6.4: Operators for General Comparisons
Symbol | Mnemonic | Logic |
? | CONTAINS | True result if the specified value is contained in the data value (character only). |
LIKE | True result if data value matches the specified value which may include wildcards. _ is any single character, % is any set of characters. |
Program 2.6.2 offers two methods for simplifying the condition on MortgageStatus, one using CONTAINS, the other using LIKE. Either reproduces Output 2.6.1.
Program 2.6.2: Conditioning on a Variable Using General Comparison Operators
proc means data=BookData.IPUMS2005Basic nonobs maxdec=0;
class Metro;
var HHIncome;
format Metro Metro.;
where Metro in (2,3,4) and MortgageStatus contains ’Yes’;
run;
proc means data=BookData.IPUMS2005Basic nonobs maxdec=0;
class Metro;
var HHIncome;
format Metro Metro.;
where Metro in (2,3,4) and MortgageStatus like ’%Yes%’;
run;
CONTAINS checks to see if the data value contains the string Yes; again, note that the casing must be correct to ensure a match. Also, ensure single or double quotation marks enclose the value to search for—in this case, without the quotation marks, Yes forms a legal variable name and is interpreted by the compiler as a reference to a variable.
LIKE allows for the use of wildcards as substitutes for non-essential character values. Here the % wildcard before and after Yes results in a true condition if Yes appears anywhere in the string and is thus logically equivalent to the CONTAINS conditioning above.
2.7 Using the FREQ Procedure for Categorical Summaries
To produce tables of frequencies and relative frequencies (percentages) like those shown for the case study in Outputs 2.2.3 and 2.2.4, the FREQ procedure is the tool of choice, and this section covers its fundamentals.
2.7.1 Choosing Analysis Variables in PROC FREQ
As in previous sections, the examples here use the IPUMS2005Basic SAS data set, so make sure the BookData library is assigned. As a first step, enter and submit Program 2.7.1. (Note that the use of labels has been re-established in the OPTIONS statement.)
Program 2.7.1: PROC FREQ with Variables Listed Individually in the TABLE Statement
options label;
proc freq data=BookData.IPUMS2005Basic;
table metro mortgageStatus;
run;
The TABLE statement allows for specification of the variables to summarize, and a space-delimited list of variables produces a one-way frequency table for each, as shown in Output 2.7.1.
Output 2.7.1: PROC FREQ with Variables Listed Individually in the TABLE Statement
Metropolitan status | ||||
METRO | Frequency | Percent | CumulativeFrequency | CumulativePercent |
0 | 92028 | 7.94 | 92028 | 7.94 |
1 | 230775 | 19.91 | 322803 | 27.85 |
2 | 154368 | 13.32 | 477171 | 41.17 |
3 | 340982 | 29.42 | 818153 | 70.59 |
4 | 340909 | 29.41 | 1159062 | 100.00 |
MortgageStatus | Frequency | Percent | CumulativeFrequency | CumulativePercent |
N/A | 303342 | 26.17 | 303342 | 26.17 |
No, owned free and clear | 300349 | 25.91 | 603691 | 52.08 |
Yes, contract to purchase | 9756 | 0.84 | 613447 | 52.93 |
Yes, mortgaged/ deed of trust or similar debt | 545615 | 47.07 | 1159062 | 100.00 |
The TABLE statement is not required; however, in that case, the default behavior produces a one-way frequency table for every variable in the data set. Therefore, both types of SAS variables, character or numeric, are legal in the TABLE statement. Given that variables listed in the TABLE statement are treated as categorical (in the same manner as variables listed in the CLASS statement in PROC MEANS), it is best to have the summary variables be categorical or be formatted into a set of categories.
The default summaries in a one-way frequency table are: frequency (count), percent, cumulative frequency, and cumulative percent. Of course, the cumulative statistics only make sense if the categories are ordinal, which these are not. Many options are available in the table statement to control what is displayed, and one is given in Program 2.7.2 to remove the cumulative statistics.
Program 2.7.2: PROC FREQ Option for Removing Cumulative Statistics
proc freq data=BookData.IPUMS2005Basic;
table metro mortgageStatus / nocum;
run;
As with the CLASS statement in the MEANS procedure, variables listed in the TABLE statement in PROC FREQ use the format provided with the variable to construct the categories. Program 2.7.3 uses a format defined in Program 2.5.6 to bin the MortgagePayment variable into categories and, as this is an ordinal set, the cumulative statistics are appropriate.
Program 2.7.3: Using a Format to Control Categories for a Variable in the TABLE Statement
proc format;
value Mort
0=’None’
1-350=”$350 and Below”
351-1000=”$351 to $1000”
1001-1600=”$1001 to $1600”
1601-high=”Over $1600”
;
run;
proc freq data=BookData.IPUMS2005Basic;
table MortgagePayment;
format