Konrad Hinsen

Computation in Science (Second Edition)


Скачать книгу

that a machine can apply them unambiguously.

      Once we get rid of the idea that computation is about numbers, we can easily identify other operations that qualify as computations. One example is solving equations by algebraic manipulations. The steps leading from

      y+2x=z

      to

      x=12(z−y)

      are completely mechanical and can be formulated as an algorithm. The practical evidence is that computers can do the job. Software packages that implement such operations are called computer algebra systems, emphasizing algebraic manipulations. However, computer algebra systems also perform other non-numerical algorithms, for example finding the derivative or integral of an elementary function. The algorithm for computing derivatives is simple and taught in every calculus course. In contrast, the algorithm for computing indefinite integrals is very complicated [2] and was first implemented as a computer program only in 1987 [3].

      A perhaps more surprising use of computation in mathematics is the validation of proofs. A proof is a sequence of deduction steps, each of which establishes the truth of a statement based on other statements already known to be true and a finite set of rules of logic deduction. In textbooks and mathematical publications, proofs are written concisely for human readers who have a prior knowledge of mathematics. But when all the details are spelled out, proofs consist of nothing but applications of a finite set of deduction rules. These rules are applied to statements that must respect the rules of a formal language. The use of computers to check such detailed proofs is becoming more and more common, both in mathematical research and in industrial applications. This involves writing the proof in some formal language, as a sequence of symbols. The ‘proof checking’ computation transforms this sequence of symbols into a ‘true’ or ‘false’ statement about the proof’s validity.

      Leaving the narrow scope of mathematics, we find a huge number of domains where computation is applied to textual data. Finding a word in a text is a computation: it transforms the input data (the text and the word to look for) into output data (the position in the text where the word occurs), both of which can be encoded as sequences of symbols. Likewise, aligning two DNA sequences, translating a sentence from English to French, or compiling a Fortran program, are computations in which the data being processed are text. In fact, numbers and text are the two kinds of elementary data from which almost everything else is made up by composition: images are represented as two-dimensional arrays of numbers, dictionaries as sets of word pairs, etc. However, this fundamental role of numbers and text is due to their importance to humans, not computers.

      Another kind of data that are becoming increasingly important for scientific computation are graphs, in particular when used to describe networks. Traffic flow, protein interactions in a cell, and molecular structures are all examples of what can be described by graphs. An example of a computation on graphs is a check for cycles. This transforms the input data (a graph) into output data (a list of all cycles in the graph). Encoding graphs, cycles, and lists as sequences of symbols is not as obvious as encoding numbers and text. Humans already represented numbers and text by sequences of symbols long before thinking about computers and computation. The human representation of graphs, on the other hand, takes the form of two-dimensional drawings. The precise techniques for representing graphs as sequences of symbols are beyond the scope of this book, but I will come back to the general question of information representation in computers in section 5.3.2.

      Computation as a tool

      The most visible role of computation in scientific research is its use as a tool. Experimentalists process their raw data, for example to correct for artifacts of their equipment. Then they fit a theoretical model to their data by adjusting parameters. Theoreticians compute numerical predictions from a model, in order to compare them to experimental results. Both experimentalists and theoreticians make heavy use of computation for understanding their data, in particular using visualization techniques.

      It is worth making a distinction between computation as a tool and computers as a tool. Computers perform computations, but they also deal with other tasks. Scientists use computers for communicating with their peers, looking up information in Wikipedia, and for controlling lab equipment. None of these tasks involve much visible computation, though there is computation going on under the hood. Today’s computers are as much communication devices as computation devices.

      Richard Feynman had written on his blackboard: ‘What I cannot create, I do not understand.’ We cannot understand a theory, a model, or an approximation, unless we have done something with it. One way to gain experience with mathematical models is to apply them to concrete situations. Another one, even more powerful, is to implement them as computer programs. Donald Knuth has expressed this very succinctly [4]:

      It has often been said that a person does not really understand something until he teaches it to someone else. Actually a person does not really understand something until he can teach it to a computer, i.e. express it as an algorithm. The attempt to formalize things as algorithms leads to a much deeper understanding than if we simply try to comprehend things in the traditional way.

      The utility of writing programs for understanding scientific concepts and mathematical models comes from the extreme rigor and precision required in programming. Communication between humans relies on shared knowledge, starting with the definition of the words of everyday language. A typical scientific article assumes the reader to have significant specialist knowledge in science and mathematics. Even a mathematical proof, probably the most precise kind of statement in the scientific literature, assumes many definitions and theorems to be known by the reader, without even providing a list of them. A computer has no such prior knowledge. We must communicate with a computer in a formal language which is precisely defined, i.e. there are clear rules, verifiable by a computer, that define what is and what isn’t a valid expression in the language. Every aspect of our science that somehow impacts a computed result must be expressed in this formal language in order to obtain a working computer program.

      Another reason why writing a program is often useful for understanding a mathematical model is that an algorithm is necessarily constructive. In the physical sciences, most theories take the form of differential equations. These equations fully define their solutions, and are also useful for reasoning about their general properties, but provide no obvious way of finding one. Writing a computer program requires, first of all, to think about what this program does, and then about how it should go about this.

      Implementing a scientific model as a computer program also has the advantage that, as a bonus, you get a tool for exploring the consequences of the model for simple applications. Computer-aided exploration is another good way to gain a better understanding of a scientific model (see [5, 6] for some outstanding examples). In the study of complex systems, with models that are directly formulated as algorithms, computational exploration is often the only approach to gaining scientific insight [7].

      A computer program that implements a theoretical model, for example a program written with the goal of understanding this model, is a peculiar written representation of this model. It is therefore an expression of scientific knowledge, much like a textbook or a journal article. We will see in the following chapters that much scientific knowledge can be expressed in the