Metro;
label HomeValue=’Value of Home ($)’ state=’State’;
run;
By default, the output from PROC PRINT includes an Obs column, which is simply the row number for the record—the NOOBS option in the PROC PRINT statement suppresses this column.
Most SAS procedures use labels when they are provided or assigned; however, PROC PRINT defaults to using variable names. To use labels, the LABEL option is provided in the PROC PRINT statement. See Chapter Note 1 in Section 2.12 for more details.
The LABEL statement assigns labels to selected variables. The general syntax is: LABEL variable1=’label1’ variable2=’label2’ …; where the labels are given as literal values in either single or double quotation marks, as long as the opening and closing quotation marks match.
Output 2.3.2: Assigning Labels
State | MortgageStatus | MortgagePayment | Value of Home ($) | METRO |
South Carolina | Yes, mortgaged/ deed of trust or similar debt | 200 | 32500 | 4 |
North Carolina | No, owned free and clear | 0 | 5000 | 1 |
South Carolina | Yes, mortgaged/ deed of trust or similar debt | 360 | 75000 | 4 |
South Carolina | Yes, contract to purchase | 430 | 22500 | 3 |
North Carolina | Yes, mortgaged/ deed of trust or similar debt | 450 | 65000 | 4 |
In addition to using labels to alter the display of variable names, altering the display of data values is possible with formats. The general form of a format reference is:
<$>format<w>.<d>
The <> symbols denote a portion of the syntax that is sometimes used/required—the <> characters are not part of the syntax. The dollar sign is required for any format that applies to a character variable (character formats) and is not permitted in formats used for numeric variables (numeric formats). The w value is the total number of characters (width) available for the formatted value, while d controls the number of values displayed after the decimal for numeric formats. The dot is required in all format assignments, and in many cases is the means by which the SAS compiler can distinguish between a variable name and a format name. The value of format is called the format name; however, standard numeric and character formats have a null name; for example, the 5.2 format assigns the standard numeric format with a total width of 5 and up to 2 digits displayed past the decimal. Program 2.3.3 uses the FORMAT statement to apply formats to the HomeValue, MortgagePayement, and MortgageStatus variables.
Program 2.3.3: Assigning Formats
proc print data=bookdata.ipums2005mini(obs=5) noobs label;
var state MortgageStatus MortgagePayment HomeValue Metro;
label HomeValue=’Value of Home’ state=’State’;
format HomeValue MortgagePayment dollar9. MortgageStatus $1.;
run;
In the FORMAT statement, a list of one or more variables is followed by a format specification. Both HomeValue and MortgagePayment are assigned a dollar format with a total width of nine—any commas and dollar signs inserted by this format count toward the total width.
The MortgageStatus variable is character and can only be assigned a character format. The $1. format is the standard character format with width one, which truncates the display of MortgageStatus to one letter, but does not alter the actual value. In general, formats assigned in procedures are temporary and only apply to the output for the procedure.
Output 2.3.3: Assigning Formats
State | MortgageStatus | MortgagePayment | Value of Home | METRO |
South Carolina | Y | $200 | $32,500 | 4 |
North Carolina | N | $0 | $5,000 | 1 |
South Carolina | Y | $360 | $75,000 | 4 |
South Carolina | Y | $430 | $22,500 | 3 |
North Carolina | Y | $450 | $65,000 | 4 |
2.3.2 PROC SORT and BY-Group Processing
Rows in a data set can be reordered using the SORT procedure to sort the data on the values of one or more variables in ascending or descending order. Program 2.3.4 sorts the BookData.Ipums2005Mini data set by the HomeValue variable.
Program 2.3.4: Sorting Data with the SORT Procedure
proc sort data=bookdata.ipums2005mini out=work.sorted;
by HomeValue;
run;
proc print data=work.sorted(obs=5) noobs label;
var state MortgageStatus MortgagePayment HomeValue Metro;
label HomeValue=’Value of Home’ state=’State’;
format HomeValue MortgagePayment dollar9. MortgageStatus $1.;
run;
The default behavior of the SORT procedure is to replace the input data set, specified in the DATA= option, with the sorted data set. To create a new data set from the sorted observations, use the OUT= option.
The BY statement is required in PROC SORT and must name at least one variable. As shown in Output 2.3.4, the rows are now ordered in increasing levels of HomeValue.
Output 2.3.4: Sorting Data with the SORT Procedure
State | MortgageStatus | MortgagePayment | Value of Home | METRO |
North Carolina | N | $0 | $5,000 | 1 |
South Carolina | Y | $430 | $22,500 | 3 |
North Carolina | Y | $300 | $22,500 | 3 |
South Carolina | Y | $200 | $32,500 | 4 |
North Carolina | N | $0 | $45,000 | 1 |
Sorting on more than one variable gives a nested or hierarchical sorting. In those cases, values are ordered on the first variable, then for groups of records having the same value of the first variable those records are sorted on the second variable, and so forth. A specification of ascending (the default) or descending order is made for each variable. Program 2.3.5 sorts the BookData.Ipums2005Mini data set on three variables present in the data set.
Program 2.3.5: Sorting on Multiple Variables
proc sort data=bookdata.ipums2005mini out=work.sorted;
by MortgagePayment descending State descending HomeValue;
run;
proc print data=work.sorted(obs=6) noobs label;
var state MortgageStatus MortgagePayment HomeValue Metro;
label HomeValue=’Value of Home’ state=’State’;
format HomeValue MortgagePayment dollar9. MortgageStatus $1.;
run;
The first sort is on MortgagePayment, in ascending order. Since 0 is the lowest value and that value occurs on six records in the data set, Output 2.3.5 shows one block of records with MortgagePayment 0.
The next sort is on State in descending order—note that the DESCENDING option precedes the variable it applies to. For the six records shown in Output 2.3.5, the first three are South Carolina and the final three are North Carolina—descending alphabetical order. Note, when sorting character data, casing matters—uppercase values are before lowercase in such a sort. For more details about determining the sort order of character data, see Chapter Note 2 in Section 2.12.
The final sort is on HomeValue, also in descending order—note that the DESCENDING option must precede each variable it applies to. So, within each State group in Output 2.3.5, values of the HomeValue variable are in descending order.
Output 2.3.5: Sorting on Multiple Variables
State | MortgageStatus | MortgagePayment | Value
|