how FlightNum was read. SAS begins at column 5 and reads until it encounters the next delimiter, which is in column 15. Table 2.9.7 shows this with, as before, the blue triangle indicating the starting column and the red octagon the ending column.
Table 2.9.7: Column Pointers at the Starting and Ending Positions for Parsing FlightNum
01 | 02 | 03 | 04 | 05 | 06 | 07 | 08 | 09 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 |
4 | 3 | 9 | 1 | 2 | / | 1 | 1 | / | 2 | 0 | 0 | 0 | L | A | X | 2 | 0 | 1 | 3 | 7 | |||||
The characters read for Date are “12/11/2000”, and SAS must parse this string using the provided instructions, which are given by the dollar sign used in the INPUT statement for this variable. This declares its type as character and, since there are no instructions about the length, the default length of eight is used. Unlike numeric variables, where the length attribute does not control the displayed width, the length of a character variable is typically equal to the number of characters that can be stored. (Check the SAS Documentation to determine how length is related to printed width for character values in various languages and encodings.) The resulting value for date, shown in Table 2.9.8, has been truncated to the first 8 characters. This highlights that while list input reads from delimiter to delimiter, it only stores the value parsed from the input buffer subject to any attributes, such as type and length, previously established.
Table 2.9.8: Representation of the PDV after Date is Read During the First Iteration
_N_ | _ERROR_ | FlightNum | Date | Destination | FirstClass | EconClass |
1 | 0 | 439 | 12/11/20 | . | . |
As demonstrated in Program 2.8.5, one way to prevent the truncation of Date values is to use a LENGTH statement to set the length of Date to at least 10 bytes. Another means of avoiding this issue with Date is to read it with an informat, a concept covered in Chapter 3. This has the added benefits of converting the Date values to numeric; allowing for easier sorting, computations, and changes in display formats.
SAS continues through the input buffer and, since all variables in this example are read using simple list input, the reading and parsing follows the same process as before. Table 2.9.9 shows the starting and stopping position of each of the remaining variables. Note that because SAS automatically advances the column pointer by one column after every variable and because list input scans for the next non-delimiter, the starting position for Destination in this record is column 17 rather than column 16.
Table 2.9.9: Column Pointers at the Starting and Ending Positions for Parsing the Remaining Values
01 | 02 | 03 | 04 | 05 | 06 | 07 | 08 | 09 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 |
4 | 3 | 9 | 1 | 2 | / | 1 | 1 | / | 2 | 0 | 0 | 0 | L | A | X | 2 | 0 | 1 | 3 | 7 | |||||
Table 2.9.10 contains the final PDV for the first record—the values that are sent to the data set at the end of the first iteration of the DATA step which corresponds to the location of the RUN statement (or other step boundary).
Table 2.9.10: Representation of the PDV After Reading All Values
_N_ | _ERROR_ | FlightNum | Date | Destination | FirstClass | EconClass |
1 | 0 | 439 | 12/11/20 | LAX | 20 | 137 |
After this record is sent to the data set, the execution phase returns to the top of the DATA step and the variables are reinitialized as needed. _N_ is incremented to 2, _ERROR_ remains at zero since no tracked errors have been encountered, and the remaining variables are set back to missing in this case. Further iterations continue the process: the next row is loaded into the input buffer, values are parsed to the PDV, and those values are sent to the data set at the bottom of the DATA step. This implicit iteration terminates when the end-of-file marker is encountered.
2.9.3 Debugging the DATA Step
Following the process from Section 2.9.1 may seem tedious at first, but understanding how SAS parses data when moving it from the raw file through the input buffer then to the PDV (and ultimately to the data set) is crucial for success in more complex cases. It is often inefficient to develop a program using a trial-and-error approach; instead, knowledge of the data-handling process ensures a smoother, more reliable process for developing programs. This section discusses several statements that SAS provides to help follow aspects of the parsing process through iterations of the DATA step. Program 2.9.2 demonstrates the LIST statement using Input Data 2.9.1.
Program 2.9.2: Demonstrating the LIST Statements
data work.flights;
infile RawData(‘flights.prn’);
input FlightNum Date $ Destination $ FirstClass EconClass;
list;
run;
The LIST statement writes the contents of the input buffer to the log at the end of each iteration of the DATA step, placing a ruler before the first observation is printed. Log 2.9.2 shows the results of including the LIST statement in Program 2.9.2 for the first five records. The complete input buffer, including the delimiters, appear for each record.
Log 2.9.2: Demonstrating the LIST Statements
RULE: ----+----1----+----2----+----3----+----4
1 439 12/11/2000 LAX 20 137 26
2 921 12/11/2000 DFW 20 131 25
3 114 12/12/2000 LAX 15 170 26
4 982 12/12/2000 dfw 5 85 25
5 439 12/13/2000 LAX 14 196 25
Before writing the input buffer contents for the first time, the INPUT statement prints a ruler to the log.
The LIST statement writes the complete input buffer for the record.
If the records have variable length, then the LIST statement also prints the number of characters in the input buffer.
Since the log is a plain-text environment, SAS cannot display non-printable characters such as tabs. However, in these cases, SAS prints additional information to the log to ensure an unambiguous representation of the input buffer. Program 2.9.3 demonstrates the results of using the LIST statement with a tab-delimited file.
Program 2.9.3: LIST Statement Results with Non-Printable Characters
data work.flights;
infile RawData(‘flights.txt’) dlm = ‘09’x;
input FlightNum Date $ Destination $ FirstClass