sgplot data=work.cars;
hbar typeB / response=mpg_combo stat=mean limits=upper;
where typeB ne ‘Hybrid’;
run;
title ‘MPG Five-Number Summary’;
title2 ‘Across Types’;
proc means data=work.cars min q1 median q3 max maxdec=1;
class typeB;
var mpg:;
run;
SAS code is written in statements, each of which ends in a semicolon. The statements indicated here (OPTIONS and TITLE) are examples of global statements. Global statements are statements that take effect as soon as SAS compiles those statements. Typically, the effects remain in place during the SAS session until another statement is submitted that alters those effects.
The SAS DATA step has a variety of uses; however, it is primarily a tool for creation or manipulation of data sets. A DATA step is generally comprised of several statements forming a block of code, ending with the RUN statement, the role of which is described in .
Procedures in SAS are used for a variety of tasks and, like the DATA step, are generally comprised of several statements. These are generically referred to as PROC steps.
The PROC MEANS result includes the variables MPG_City, MPG_Highway, and MPG_Combo even though none of these are explicitly written in the procedure code. The colon (:) at the end of a variable name acts as a wildcard indicating that any variable name starting with the prefix given is part of the designated set, this shortcut is known in SAS as a name prefix list. For other types of variable lists, see Chapter Note 3 in Section 1.7.
With DATA and PROC steps defined as blocks of code, each of these blocks is terminated with a step-boundary. The RUN statement is a commonly used as a step boundary, though it is not required for each DATA or PROC step. See Section 1.4.2 for details.
1.4.2 SAS DATA and PROC Steps
SAS processing of code submissions includes two major components: compilation and execution. In some cases, individual statements are compiled and take effect immediately, while at other times, a series of statements is compiled as a set and then executed after the complete set is processed by the compiler. In general, statements that compile and take effect individually and immediately are global statements. Statements that compile and execute as a set are generally referred to as steps, with the SAS language including both DATA steps and procedure (or PROC) steps.
The DATA step starts with a DATA statement, and a PROC step starts with a PROC statement that includes the name of the procedure, and all steps end with some form of a step boundary. As noted in Program 1.4.1, a commonly used step boundary in the SAS language is the RUN statement, but it is technically not required for each step. Any invocation of any DATA or PROC step is also defined as a step boundary due to the fact that DATA and PROC steps cannot be directly nested together in the SAS language. In general, it is considered a good programming practice to explicitly provide a statement for the step boundary, rather than implicitly through invocation of a DATA or PROC step. The code submissions in Figure 1.4.1 and Program 1.4.2 provide illustrations of the advantages of explicitly defining the end of a step.
In either the SAS windowing environment or SAS University Edition, portions of code can be compiled and executed by highlighting that section and then submitting. Having clear definitions from beginning to end for any DATA or PROC step aids in the ability to submit portions of code, which can be accomplished by using the RUN statement as an explicit step boundary. Figures 1.4.1A and 1.4.1B show submissions of the two PROC steps from Program 1.4.1 along with their associated TITLE statements.
Figure 1.4.1A: Submitting Portions of Code in SAS University Edition
Figure 1.4.1B: Submitting Portions of Code in the SAS Windowing Environment
This submission reproduces the bar chart and the table of statistics produced previously in Figure 1.3.7. However, notice that the result is somewhat different in the SAS windowing environments and SAS University Edition. In the SAS windowing environment, the output is added to the output from the previous submission (and the log from this submission is also added to the previous log information). In SAS University Edition, the output is replaced, and the sub-tab for Output Data is not present because the DATA step did not run. With default settings in place, submissions are cumulative for both log and output in SAS windowing environment; conversely, replacement is the default in SAS University Edition. For more information about managing results in either environment, see Chapter Note 1 in Section 1.7.
Program 1.4.2 shows the code portion submitted in Figure 1.4.1 with the first RUN statement removed. Delete the RUN statement and re-submit the selection, review the output (Figure 1.4.2) and details below for another example of why explicitly ending steps in SAS is a good programming practice.
Program 1.4.2: Multiple Steps Without Explicit Step Boundaries
title ‘Combined MPG Means’;
proc sgplot data=work.cars;
hbar typeB / response=mpg_combo stat=mean limits=upper;
where typeB ne ‘Hybrid’;
title ‘MPG Five-Number Summary’;
title2 ‘Across Types’;
proc means data=work.cars min q1 median q3 max maxdec=1;
class typeB;
var mpg:;
run;
The first statement compiled and executed is this TITLE statement, which assigns the quoted/literal value as the primary title line.
The SGPLOT procedure is invoked for compilation and execution by this statement. Subsequent statements are compiled as part of the SGPLOT step until a step boundary is reached.
This is the position of the RUN statement in Program 1.4.1 and, when it is compiled in that program, it signals the end of the SGPLOT step. Assuming no errors, PROC SGPLOT executes at that point; however, with no RUN statement present in this code, compilation of the SGPLOT step is not complete and execution does not begin.
These two TITLE statements, which are global, now compile and take effect. Since the SGPLOT procedure still has not completed compilation, nor started execution, this TITLE statement replaces the first title line assigned in .
This statement starts the MEANS procedure which, due to the fact that steps cannot be nested, indicates that the SGPLOT statements are complete. Compilation of the SGPLOT step ends and it is executed, with the titles in now placed erroneously on the graph.
Figure 1.4.2: Failing to Define the End of a Step
In any interactive session, the final step boundary must be explicitly stated. For a discussion of the differences between interactive and non-interactive sessions in the SAS windowing environment and SAS University Edition, see Chapter Note 2 in Section 1.7.
The remainder of Program 1.4.1 is a DATA step, which is shown as Program 1.4.3 with a few details about its operation highlighted. The DATA step is a powerful tool for data manipulation, offering a variety of functions, statements, and other programming elements. The DATA step is of such importance that it is featured in every chapter of this book.
Program 1.4.3: DATA Step from Program 1.4.1
data