James Blum

Fundamentals of Programming in SAS


Скачать книгу

      For this case, a simplified format that distinguishes metro, non-metro, and non-identifiable observations may be desired. Program 2.5.3 contains two approaches to this, the first being clearly the most efficient.

      Program 2.5.3: Assigning Multiple Values to the Same Formatted Value

      proc format;

      value MetroB

      0 = “Not Identifiable”

      1 = “Not in Metro Area”

      2,3,4  = “In a Metro Area”

      ;

      value MetroC

      0 = “Not Identifiable”

      1 = “Not in Metro Area”

      2 = “In a Metro Area”

      3 = “In a Metro Area”

      4 = “In a Metro Area”

      ;

      run;

       A comma-separated list of values is legal on the left side of each assignment, which assigns the formatted value to each listed data value.

       This format accomplishes the same result; however, it is important that the literal values on the right side of the assignment are exactly the same. Differences in even simple items like spacing or casing results in different formatted values.

      Either format given in Program 2.5.3 can replace the Metro format in Program 2.5.2 to create the result in Output 2.5.3.

      Output 2.5.3: Assigning Multiple Values to the Same Formatted Value

Analysis Variable : HHIncome
METRONMeanStd DevMinimumMaximum
Not Identifiable920285480052333-199981076000
Not in Metro Area2307754785645547-299971050000
In a Metro Area8362596902471495-299971739770

      It is also possible to use the dash character as an operator in the form of ValueA-ValueB to define a range on the left side of any assignment, which assigns the formatted value to every data value between ValueA and ValueB, inclusive. Program 2.5.4 gives an alternate strategy to constructing the formats given in Program 2.5.3 and that format can also be placed into Program 2.5.2 to produce Output 2.5.3.

      Program 2.5.4: Assigning a Range of Values to a Single Formatted Value

      proc format;

      value MetroD

      0 = “Not Identifiable”

      1 = “Not in Metro Area”

      2-4 = “In a Metro Area”

      ;

      run;

      Certain keywords are also available for use on the left side of an assignment, one of which is OTHER. OTHER applies the assigned format to any value not listed on the left side of an assignment elsewhere in the format definition. Program 2.5.5 uses OTHER to give another method for creating a format that can be used to generate Output 2.5.3. It is important to note that using OTHER often requires significant knowledge of exactly what values are present in the data set.

      Program 2.5.5: Assigning a Range of Values to a Single Formatted Value

      proc format;

      value MetroE

      0 = “Not Identifiable”

      1 = “Not in Metro Area”

      other = “In a Metro Area”

      ;

      run;

      In general, value ranges should be non-overlapping, and the < symbol—called an exclusion operator in this context—can be used at either end (or both ends) of the dash to indicate the value should not be included in the range. Overlapping ranges are discussed in Chapter Note 5 in Section 2.12. Using exclusion operators to create non-overlapping ranges allows for the categorization of a quantitative variable without having to know the precision of measurement. Program 2.5.6 gives two variations on creating bins for the MortgagePayment data and uses those bins as classes in PROC MEANS, with the results shown in Output 2.5.6A and Output 2.5.6B.

      Program 2.5.6: Binning a Quantitative Variable Using a Format

      proc format;

      value Mort

      0=’None’

      1-350=”$350 and Below”

      351-1000=”$351 to $1000”

      1001-1600=”$1001 to $1600”

      1601-high=”Over $1600”

      ;

      value MortB

      0=’None’

      1-350=”$350 and Below”

      350<-1000=”Over $350, up to $1000”

      1000<-1600=”Over $1000, up to $1600”

      1600<-high=”Over $1600”

      ;

      run;

      proc means data=BookData.IPUMS2005Basic nonobs maxdec=0;

      class MortgagePayment;

      var HHIncome;

      format MortgagePayment Mort.;

      run;

      proc means data=BookData.IPUMS2005Basic nonobs maxdec=0;

      class MortgagePayment;

      var HHIncome;

      format MortgagePayment MortB.;

      run;

       The keywords LOW and HIGH are available so that the maximum and minimum values need not be known. When applied to character data, LOW and HIGH refer to the sorted alphanumeric values. Note that the LOW keyword excludes missing values for numeric variables but includes missing values for character variables.

       In these value ranges, the values used exploit the fact that the mortgage payments are reported to the nearest dollar.

       Using the < symbol to not include the starting ranges allows the bins to be mutually exclusive and exhaustive irrespective of the precision of the data values. The exclusion operator, <, omits the adjacent value from the range so that 350<-1000 omits only 350, 350-<1000 omits only 1000, and 350<-<1000 omits both 350 and 1000.

       When a format is present for a class variable, the format is used to construct the unique values for each category, and this behavior persists in most cases where SAS treats a variable as categorical.

      Output 2.5.6A: Binning a Quantitative Variable Using the Mort Format

Analysis Variable : HHIncome
MortgagePaymentNMeanStd DevMinimumMaximum
None6036914533453557-222981739770
$350 and Below598564785142062-16897841000
$351 to $10002831116499245107-199981060000
$1001 to $16001288019610763008-299971125000
Over $160083603153085117134-299971407000

      Output 2.5.6B: Binning a Quantitative Variable Using the MortB Format

Analysis Variable : HHIncome
MortgagePaymentNMeanStd DevMinimumMaximum
None6036914533453557-222981739770
$350 and Below598564785142062-16897841000
Over