Therese Hardin

Concepts and Semantics of Programming Languages 1


Скачать книгу

object code produced by the compiler may be either binary code encoding instructions directly or source text in assembly code. In the latter case, a program – known as the assembler – must be called to transform this low-level source code into binary code. Generally speaking, assemblers simply produce a mechanical translation of instructions written mnemonically (mov, add, jmp, etc.) into binary representations. However, certain more sophisticated assemblers may also carry out optimization operations at this level.

      Assembling mnemonic code into binary code is a very simple operation, which does not alter the structure of the program. The reference manual of the target CPU provides, for each instruction, the meaning of the bits of the corresponding binary word. For example, the reference manual for the MIPS32® architecture [MIP 13] describes the 32-bit binary format of the instruction ADD rd, rs, rt (with the effect rd ← rs + rt on the registers) as:

      Figure 1.4. Coding the ADD instruction in MIPS32®

      1.2.3.5. Linking

      A single program may be made up of several source files, compiled separately. Once the object code from each source file has been produced, all these codes must be collected into a single executable file. Each object file includes “holes”, indicating unknown information at the moment of production of this object code. It is important to know where to find this missing code, when calling functions defined in a different compilation unit, or where to find variables defined in a location outside of the current unit.

      The linker has to gather all the object files and fill all the holes. Evidently, for a set of object files to lead to an executable file, all holes must be filled; so the code of every function called in the source must be available. The linking process also has to integrate the needed code, if it comes from some libraries, whether from the standard language library or a third-party library. There is one final question to answer, concerning the point at which execution should begin. In certain languages (such as C, C++ and Java), the source code must contain one, and only one, special function, often named main, which is called to start the execution. In other languages (such as Python and OCaml), definitions are executed in the order in which they appear, defined by the file ordering during the linking process. Thus, “executing” the definition of a function does not call the function: instead, the “value” of this function is created and stored to be used later when the function is called. This means that programmers have to insert into the source file a call to the function which they consider to be the “starting point” of the execution. This call is usually the final instruction of the last source file processed by the linker.

Schematic illustration of compilation process.

      1.2.3.6. Interpretation and virtual machines

      There are several advantages of using a VM: the compiler no longer needs to take the specificities of the CPU into account, the code is often more compact and portability is higher. As long as the executable file for the virtual machine interpreter is available on a computer, it will be possible to generate a binary file for the computer in question. The drawback to this approach is that the programs obtained in this way are often slower than programs compiled as “native” machine code.

      2

      Introduction to Semantics of Programming Languages

      This chapter introduces intuitively the notions of name, environment, memory, etc., along with a first formal description of these notions. It allows readers to familiarize themselves with the semantic approach of programming that we share with a number of other authors [ACC 92, DOW 09, DOW 11, FRI 01, WIN 93].

      Any high-level programming language uses names to denote the entities handled by programs. These names are generally known as identifiers, drawing attention to the fact that they are constructed in accordance with the syntactic rules of the chosen language. They may be used to denote program-specific values or values computed during execution. They may also denote locations (i.e. addresses in the memory), they are then called mutable variables. And identifiers can also denote operators, functions, procedures, modules, objects, etc., according to the constructs present in the language. For example, pi is often used to denote an approximate value of π; + is also an identifier, denoting an addition operator and often placed between the two operands, i.e. in infix position, as in 2 + 3. The expression 2 * x + 1 uses the identifier x and to compute its value, we need to know the value denoted by x. Retrieving the value associated with a given identifier is a mechanism at the center of any high-level language. The semantics of a language provides a model of this mechanism, presented – in a simplified form – in section 2.1.

       2.1.1. Evaluation environment

      Let X be a set of identifiers and V a set of values. The association of an identifier x ∈ X with a value v ∈ V is called a binding (of the identifier to its value), and a set Env of bindings is called an execution environment or evaluation environment. Env(x) denotes the value associated with the identifier x in Env. The set of environments is denoted as E.

      In practice, the set of identifiers X that are actually used is finite: usually, we only consider those identifiers that appear in a program. An environment may thus be represented by a list of bindings, also called Env: