4.4. Spontaneous reactions corrupt the DNA database.
Repair Processes
If there were no way to correct altered DNA, the rate of mutation would be intolerable. DNA excision and DNA repair enzymes have evolved to detect and to repair altered DNA. The role of the repair enzymes is to cut out (excise) the damaged portion of DNA and then to repair the base sequence. Much of our knowledge of DNA repair has been derived from studies on E. coli, but the general principles apply to other organisms such as ourselves. Repair is possible because DNA comprises two complementary strands. If the repair mechanisms can identify which of the two strands is the damaged one, it can be repaired to be as good as new by rebuilding it so that is again complementary to the undamaged strand.
Two types of excision repair are described in this section: base excision repair and nucleotide excision repair. The common themes for each of these repair mechanisms are: (i) an enzyme recognizes the damaged DNA, (ii) the damaged portion is removed, (iii) DNA polymerase inserts the correct nucleotide(s) into position (according to the base sequence of the second DNA strand), and (iv) DNA ligase joins the newly repaired section to the remainder of the DNA strand.
Base excision repair is needed to repair DNA that has lost a purine (depurination), or where a cytosine has been deaminated to uracil (U). Although uracil is a normal constituent of RNA, it does not form part of undamaged DNA and is recognized and removed by the repair enzyme uracil – DNA glycosidase (Figure 4.4). This leaves a gap in the DNA where the base had been attached to deoxyribose. There is no enzyme that can simply reattach a C into the vacant space on the sugar. Instead, an enzyme called AP endonuclease recognizes the gap and removes the sugar by breaking the phosphodiester bonds on either side (Figure 4.6). When DNA has been damaged by the loss of a purine (Figure 4.4), AP endonuclease also removes the sugar that has lost its base. The AP in the enzyme's name means apyrimidinic (without a pyrimidine) or apurinic (without a purine).
The repair process for reinserting a purine or a pyrimidine into DNA is now the same (Figure 4.6). DNA polymerase I replaces the appropriate deoxyribonucleotide into position. DNA ligase then seals the strand by catalyzing the reformation of a phosphodiester bond.
Nucleotide excision repair is required to correct a thymine dimer. The thymine dimer, together with some 30 surrounding nucleotides, is excised from the DNA. Repairing damage of this bulky type requires several proteins because the exposed, undamaged, DNA strand must be protected from nuclease attack while the damaged strand is repaired by the actions of DNA polymerase I and DNA ligase.
Even with all these protection systems in place, the cell divisions that create and repair our bodies generate errors, so that the adult human contains many cells with somatic mutations. Most are irrelevant to the specialized operation of that cell, or merely reduce its ability to function. Some, however, can cause cancer (page 241).
GENE STRUCTURE AND ORGANIZATION IN EUKARYOTES
Introns and Exons – Additional Complexity in Eukaryotic Genes
Genes that code for proteins should be simple things: DNA makes RNA makes protein, and a gene codes for the amino acids of a protein by the three‐base genetic code. In prokaryotes, indeed, a gene is a continuous series of bases that, read in threes, code for the protein. This simple and apparently sensible system does not apply in eukaryotes. Instead, the protein‐coding regions of almost all eukaryotic genes are organized as a series of separate bits interspersed with noncoding regions. The protein‐coding regions of the split genes are exons. The regions between are called introns, short for intervening sequences. At the bottom of Figure 4.7 we show the structure of the β‐globin gene, which contains three exons and two introns. Introns are often very long compared to exons. As happens in prokaryotes, messenger RNA complementary to the DNA is synthesized, but then the introns are spliced out before the mRNA leaves the nucleus (page 22). This means that a gene is much longer than the mRNA that ultimately codes for the protein. The name exon derives from the fact that these are the regions of the gene that, when transcribed into mRNA, exit from the nucleus.
Medical Relevance 4.2 Bloom's Syndrome and Xeroderma Pigmentosum
DNA helicases are essential proteins required to open up the DNA helix during replication. In Bloom's syndrome, mutations give rise to a defective helicase. The result is excessive chromosome breakage, and affected people are predisposed to many different types of cancers when they are young.
People who suffer from the genetic disorder xeroderma pigmentosum are deficient in one of the enzymes for excision repair. As a result, they are very sensitive to ultraviolet light. They contract skin cancer even when they have been exposed to sunlight for very short periods because thymine dimers produced by ultraviolet light are not excised from their genomes.
In fact, there is an evolutionary rationale to this apparently perverse arrangement. As we will see (page 114), a single protein is often composed of a series of domains, with each domain performing a different role. The breaks between exons often correspond to domain boundaries. During evolution, reordering of exons has created new genes that have some of the exons of one gene, and some of the exons of another, and hence generates novel proteins composed of new arrangements of domains, each of which still does its job.
The Major Classes of Eukaryotic DNA
We do not yet fully understand the construction of our nuclear genome. Only about 1.1% of the human genome codes for exonic sequences (i.e. makes protein) with about 24% coding for introns. Most protein‐coding genes occur only once in the genome and are called single‐copy genes.
Many genes have been duplicated at some time during their evolution. Mutation over the succeeding generations causes the initially identical copies to diverge in sequence and produce what is known as a gene family. Members of a gene family usually have a related function, for example the products of the globin family transport oxygen from our lungs to our tissues. These genes generate related proteins or isoforms, which are often distinguished by placing