quantitatively, we can immediately see that each codon can code for 64 combinations. There are three positions in each codon, and we already know that at each of these positions there are four possible bases: G, C, A, and U. This gives us 4 × 4 × 4 = 64 possible combinations for each three-letter codon. But we already saw in the previous chapter that there are generally only 20 amino acids used by life. As a result, there is redundancy. Each amino acid can be coded for by more than one codon. We call this the “degeneracy of the genetic code.” Figure 5.12 shows the mRNA codons and their corresponding amino acids. If you look at this table, you can see that each amino acid generally has more than one codon. Apart from the amino acid tryptophan (which just has one codon), all other amino acids have at least two, and many of them have four codons.
Figure 5.12 The table of codons of mRNA corresponding to amino acids. The amino acids are shown with their three-letter designation (see Appendix A6). Note the degeneracy of the code. Apart from small variations in different species, this table is universal across life, one line of evidence that all known life on Earth derives from a common origin.
Discussion Point: Why Is There Degeneracy in the Genetic Code?
The degeneracy of the genetic code (genetic degeneracy) results from a simple consideration of the mathematics of the genetic code. As the code uses base pairs, which allow the DNA molecule to be opened down the middle and two identical helices to be synthesized, it necessarily has an even number of bases. Consider a genetic code with only two bases. If it had a codon with three positions, like our own code, it would produce 2 × 2 × 2 = 8 possible amino acids. This is not enough to code for the 20 amino acids required by the life that we know. The only way such a two-base genetics could produce enough codes to have 20 amino acids would be to have five positions on a codon to produce: 2 × 2 × 2 × 2 × 2 = 32 codes, leaving a degeneracy of 10 (assuming that we use one codon as a Start and one as a Stop codon). You can also consider a code with six bases instead of four. If it had only two bases in a codon, it would give 6 × 6 = 36 codes, which is enough to code for the amino acids known in life, with 16 places left over for redundancy. The main point to realize here is that in whatever way we make the genetic code, we end up with either too few codes, or some left over, in other words degeneracy. Another interesting consideration is that a code with only two bases would have a very limited repertoire of coding. The DNA molecule might have to be longer, or there would need to be more of it, to code for the same information in terrestrial life. A greater number of bases than our four bases (such as six or eight) leads to other potential problems, such as a greater frequency in mismatches between bases. It may not be chance that our code has four bases – perhaps it represents a process of biochemical optimization. What do you think?
Some codons code for the instruction to Stop reading and one of them (AUG – a methionine) to Start reading the mRNA strand.
Each amino acid brought to the mRNA in this way forms a peptide bond with the existing chain, and so as new tRNAs bind to the mRNA, a polypeptide or protein is synthesized, with the ribosome continuing to move along the mRNA strand. Thus, the mRNA sequence has been translated into the primary protein sequence. This primary sequence folds together to make the three-dimensional structure of a useful functioning protein.
The sequence of bases that codes for a single protein is called a gene, and we call the entire complement of genes within an organism its genome. The genome size varies enormously between organisms. The human genome contains about 3240 million bases (megabases or Mb; sometimes also written as megabase pairs or Mbp) of DNA, bacteria have up to about 13 Mb of DNA depending on the species, and they typically have about 4000 genes. The smallest genome of a free-living self-replicating organism belongs to Carsonella ruddii, which lives within the psyllids, a family of sap-feeding insects. It has a genome of just 160 000 bases (kilobases or kb or kbp) of DNA and 182 protein-coding genes. The smallest flu virus (which cannot replicate on its own) has only 11 genes. Although genome size is very loosely linked to complexity (bacteria tend to have smaller genomes than animals), this relationship is by no means reliable. Some protozoa (single-celled eukaryotes) have larger genomes than humans. This great difference between the genome sizes of organisms is called the C-value paradox. The “C” refers the quantity of DNA in the genome, which early researchers thought must be related to complexity.
Some of the DNA in an organism is referred to as non-coding DNA, as it has no known translation into protein. The amount of this non-coding DNA varies between species. In bacteria, it can be around 2%, and in humans it is 98.5%. Sometimes called junk DNA, this is a misnomer, since it is becoming increasingly understood that a proportion of this DNA has biochemical functions, for example producing RNA molecules including ribosomes, or encoding viral DNA. Some of the sequences are pseudogenes. These are sequences that code for proteins that are not produced by the cell or are replicas of other genes that are not functional. Much of the so-called C-value paradox is explained by non-coding DNA.
The process of reading DNA to RNA to protein is sometimes called the “central dogma of molecular biology” (Figure 5.13). The word “dogma” is always a troubling word in science, but the overall scheme broadly shows the two fundamental steps of reading the genetic code. The word dogma was used to capture the observation that once genetic information is turned into protein, it cannot go in the reverse direction. The information in protein is not transferred back into nucleic acid in any known life.
Figure 5.13 A summary of the two steps in reading from DNA to RNA to protein.
5.6.3 A Remarkable Code
There are a few remarkable things about this process worth mentioning. First, the table of codons shown in Figure 5.12, bar some minor modifications in some organisms, is essentially universal to all life forms. This not only shows the great antiquity of the genetic code, but also strongly suggests that all life on Earth was derived from a single common ancestor in which this code first emerged, presumably from more simple codes and structures that preceded it.
Second, the table is astounding because it is a code that allows a one-dimensional piece of information – the strand of a DNA molecule – to be transformed into the three-dimensional structure of a chemically active molecule. How did the structures emerge and what was the evolutionary process that linked one-dimensional information storage to three-dimensional biological function? This is one of the most fascinating questions in astrobiology, linked explicitly into our attempts to understand the origin of life. We come back to this when we consider the origin of life.
5.6.4 The Evolution of the Codons
How did the codon table evolve? It is instructive to notice that the amino acids are bunched together in the table. This has not gone unnoticed. One early hypothesis was that this arrangement leads to the minimization of errors. If the third location in the codon, which has a certain degree of “wobble,” changes, then the amino acid in the final protein is not changed. Mathematical models show that the codon table, as constituted, can achieve this error minimization. There are also other intriguing observations. For example, arginine is able to bind directly to the RNA codes represented by the codons in the table, without the transfer RNAs. The same is true for isoleucine. Could these amino acids have once bound directly to the RNA with the transfer RNA molecules becoming adaptors between the RNA and the amino acids later? It is possible that none of these ideas is mutually exclusive. Error minimization and direct associations between codons and amino acids and other factors may all have played a role in evolving the table early in the history of life.
Discussion Point: The Universality of the Genetic Structure and Machinery
The DNA molecule seems a thing of remarkable peculiarity