$1000, up to $1600
2.5.2 Permanent Storage and Inspection of Defined Formats
Formats can be permanently stored in a catalog (with the default name of Formats) in any assigned SAS library via the use of the LIBRARY= option in the PROC FORMAT statement. As an example, consider Program 2.5.7, which is a revision and extension of Program 2.5.6.
Program 2.5.7: Revisiting Program 2.5.6, Adding LIBRARY= and FMTLIB Options
proc format library=sasuser;
value Mort
0=’None’
1-350=”$350 and Below”
351-1000=”$351 to $1000”
1001-1600=”$1001 to $1600”
1601-high=”Over $1600”
;
value MortB
0=’None’
1-350=”$350 and Below”
350<-1000=”Over $350, up to $1000”
1000<-1600=”Over $1000, up to $1600”
1600<-high=”Over $1600”
;
run;
proc format fmtlib library=sasuser;
run;
Using the LIBRARY= option in this manner places the format definitions into the Formats catalog in the Sasuser library and accessing them in subsequent coding sessions requires the system option FMTSEARCH=(SASUSER) to be specified prior to their use. An alternate format catalog can also be used via two-level naming of the form libref.catalog, with the catalog being created if it does not already exist. Any catalog in any library that contains stored formats to be used in a given session can be listed as a set inside the parentheses following the FMTSEARCH= option. Those listed are searched in the given order, with WORK.FORMATS being defined implicitly as the first catalog to be searched unless it is included explicitly in the list.
The FMTLIB option shows information about the formats in the chosen library in the Output window, Output 2.5.7 shows the results for this case.
Output 2.5.7: Revisiting Program 2.5.6, Adding LIBRARY= and FMTLIB Options
The top of the table includes general information about the format, including the name, various lengths, and number of format categories. The default length corresponds to the longest format label set in the VALUE statement. The rows below have columns for each format label and the start and end of each value range. Note that the first category in each of these formats is assigned to a range, even though it only contains a single value, with the start and end values being the same. The use of < as an exclusion operator is also shown in ranges where it is used, and the keyword HIGH is left-justified in the column where it is used. Note the exclusion operation is applied to the value of 1600 at the low end of the range, it is a syntax error to attempt to apply it to the keyword HIGH (or LOW).
2.6 Subsetting with the WHERE Statement
In many cases, only a subset of the data is used, with the subsetting criteria based on the values of variables in the data set. In these cases, using the WHERE statement allows conditions to be set which choose the records a SAS procedure processes while ignoring the others—no modification to the data set itself is required. If the OBS= data set option is in use, the number chosen corresponds to the number of observations meeting the WHERE condition.
In order to use the WHERE statement, it is important to understand the comparison and logical operators available. Basic comparisons like equality or various inequalities can be done with symbolic or mnemonic operators—Table 2.6.1 shows the set of comparison operators.
Table 2.6.1: Comparison Operators
Operation | Symbol | Mnemonic |
Equal | = | EQ |
Not Equal | ^= | NE |
Less Than | < | LT |
Less Than or Equal | <= | LE |
Greater Than | > | GT |
Greater Than or Equal | >= | GE |
In addition to comparison operators, Boolean operators for negation and compounding (along with some special operators) are also available—Table 2.6.2 summarizes these operators.
Table 2.6.2: Boolean and Associated Operators
Symbol | Mnemonic | Logic |
& | AND | True result if both conditions are true |
| | OR | True result if either, or both, conditions are true |
IN | True if matches any element in a list | |
BETWEEN-AND | True if in a range of values (including endpoints) | |
~ | NOT | Negates the condition that follows |
Revisiting Program 2.5.2 and Output 2.5.2, subsetting the results to only include observations known to be in a metro area can be accomplished with any one of the following WHERE statements.
where Metro eq 2 or Metro eq 3 or Metro eq 4;
where Metro ge 2 and Metro le 4;
where Metro in (2,3,4);
where Metro between 2 and 4;
where Metro not in (0,1);
Each possible value can be checked by using the OR operator between equality comparisons for each possible value. When using OR, each comparison must be complete/specific. For example, it is not legal to say: Metro eq 2 or eq 3 or eq 4. It is legal, but unhelpful, to say Metro eq 2 or 3 or 4, as SAS uses numeric values for truth (since it does not include Boolean variables). The values 0 and missing are false, while any other value is true; hence, Metro eq 2 or 3 or 4 is an immutably true condition.
This conditioning takes advantage of the fact that the desired values fall into a range. As with OR, each condition joined by the AND must be complete; again, it is not legal to say: Metro ge 2 and le 4. Also, with knowledge of the values of Metro, this condition could have been simplified to Metro ge 2. However, good programming practice dictates that specificity is preferred to avoid incorrect assumptions about data values.
IN allows for simplification of a set of conditions that might otherwise be written using the OR operator, as was done in . The list is given as a set of values separated by commas or spaces and enclosed in parentheses.
BETWEEN-AND allows for simplification of a value range that can otherwise be written using AND between appropriate comparisons, as was done in .
The NOT operator allows the truth condition to be made the opposite of what is specified. This is a slight improvement over , as the list of values not desired is shorter than the list of those that are.
Adding any of these WHERE statements (or any other logically equivalent WHERE statement) to Program 2.5.2 produces the results shown in Table 2.6.3.
Table 2.6.3: Using WHERE to Subset Results to Specific Values of the Metro Variable
Analysis Variable : HHIncome | |||||
METRO | N | Mean | Std Dev | Minimum | Maximum |
Metro, Inside City | 154368 | 60328 | 70874 | -19998 | 1391000 |
Metro, Outside City | 340982 | 77648 | 75907 | -29997 | 1739770 |
Metro, City Status Unknown | 340909 | 64335 | 66110 | -22298 | 1536000 |
The tools available allow for conditioning on more than one variable, and the variable(s) conditioned on need only be in the data set in use and do not have to be present in the output generated. In Program 2.6.1, the output is conditioned