James Blum

Fundamentals of Programming in SAS


Скачать книгу

initializes to zero and sets to one at the first instance of certain non-syntax errors. Details about the errors it tracks are discussed in Chapter Note 10 in Section 2.12. Automatic variables are not written to the resulting data set, though their values can be assigned to new variables or used in other DATA step programming statements.

      Example

      Some aspects of the compilation and execution phases are demonstrated below using a raw data set having the five variables shown in Input Data 2.9.1: flight number (FlightNum), flight date (Date), destination city (Destination), number of first-class passengers (FirstClass), and number of economy passengers (EconClass). The first line contains a ruler to help locate the values; it is not included in the raw file. Program 2.9.1 reads in this data set.

      Input Data 2.9.1: Flights.prn data set

----+----1----+----2----+----3
439 12/11/2000 LAX 20 137
921 12/11/2000 DFW 20 131
114 12/12/2000 LAX 15 170
982 12/12/2000 dfw 5 85
439 12/13/2000 LAX 14 196
982 12/13/2000 DFW 15 116
431 12/14/2000 LaX 17 166
982 12/14/2000 DFW 7 88
114 12/15/2000 LAX 0 187
982 12/15/2000 DFW 14 31

      Program 2.9.1: Demonstrating the Input Buffer and Program Data Vector (PDV)

      data work.flights;

      infile RawData(‘flights.prn’);

      input FlightNum Date $ Destination $ FirstClass EconClass;

      run;

      During the compilation phase, SAS scans each statement for syntax errors, and finding none in this code, it creates various elements as each statement compiles. The DATA statement triggers the initial creation of the PDV with the two automatic variables: _N_ and _ERROR_. SAS then determines an input buffer is necessary when it encounters the INFILE statement. SAS automatically allocates the maximum amount of memory, 32,767 bytes, when it creates the input buffer. If explicit control is needed, the INFILE option LRECL= allows specification of a value. Compilation of the input statement completes the PDV with the five variables in the input statement established, in the same order they are encountered, along with their attributes.

      A visual representation of the input buffer and PDV at various points in the compilation phase is given in Tables 2.9.1 through 2.9.10, with the input buffer showing 26 columns here for the sake of brevity—the actual size is 32,767 columns.

      Table 2.9.1: Representation of Input Buffer During Compilation Phase

0102030405060708091011121314151617181920212223242526

      Table 2.9.2: Representation of the PDV at the Completion of the Compilation Phase

_N__ERROR_FlightNumDateDestinationFirstClassEconClass

      Note that while SAS is not case-sensitive when referencing variables, variable names are stored as they are first referenced.

      Once the compilation phase has completed, the execution phase begins by initializing the variables in the PDV. Each of the user-defined variables in this example comes from a raw data file, so they are all initialized to missing. Recall missing numeric data is represented with a single period, and missing character values are represented as a null string. The automatic variables _N_ and _ERROR_ are initialized to one and zero, respectively, since this is the first iteration through the DATA step, and no errors tracked by _ERROR_ have been encountered.

      Table 2.9.3: Representation of the PDV at the Beginning of the Execution Phase

_N__ERROR_FlightNumDateDestinationFirstClassEconClass
10...

      When SAS encounters the INPUT statement on the first iteration of the DATA step, it reads the first line of data and places it in the input buffer with each character in its own column. Table 2.9.4 illustrates this for the first record of Flights.prn.

      Table 2.9.4: Illustration of the Input Buffer After Reaching the INPUT Statement on the First Iteration

0102030405060708091011121314151617181920212223242526
43912/11/2000LAX20137

      To move raw data from the input buffer to the PDV, SAS must parse the character string from the input buffer to determine which characters should be grouped together and whether any of these character groupings must be converted to numeric values. The parsing process uses information found in the INFILE statement (for example, DSD and DLM=), the INPUT statement (such as the $ for character data), and other sources (like the LENGTH statement) to determine the values it places in the PDV.

      No delimiter options are present in the INFILE statement in Program 2.9.1; thus, a space is used as the default delimiter, and the first variable, FlightNum, is read using simple list input. SAS uses column pointers to keep track of where the parsing begins and ends for each variable and, with simple list input, SAS begins in the first column and scans until the first non-delimiter character is found. In this record, the first column is non-blank, so the starting pointer is placed there, indicated by the blue triangle below Table 2.9.5. Next, SAS scans until it finds another delimiter (a blank in this case), which is indicated below Table 2.9.5 with the red octagon. Thus, when reading the input buffer to create FlightNum, SAS has read from column 1 up to column 4.

      Table 2.9.5: Column Pointers at the Starting and Ending Positions for Parsing FlightNum

0102030405060708091011121314151617181920212223242526
43912/11/2000LAX20137

      Based on information defined in the descriptor portion of the data during compilation of the INPUT statement, FlightNum is a numeric variable with a default length of eight bytes. There are no additional instructions on how this value should be handled, so SAS converts the extracted character string “439” to the number 439 and sends it to the PDV. Note that the blank found in column 4 is not part of the parsed value—only non-delimiter columns are included. Table 2.9.6 shows the results of parsing FlightNum from the first record.

      Table 2.9.6: Representation of the PDV After FlightNum is Read During the First Iteration

_N__ERROR_FlightNumDateDestinationFirstClassEconClass
10439..

      Before parsing begins for the next value, SAS advances the column pointer one position, in this case advancing to column 5. This prevents SAS from beginning the next value in a column that was used to create a previous value. This automatic advancement of a single column occurs regardless of the input style, even though it is only demonstrated here for simple list input.

      Since Date is also read in using simple list input,