pairs form in other structural conditions. Therefore, there are canonical structures composed by Watson–Crick base pairs in the duplex structures. On the other hand, non-canonical structures include non-Watson–Crick base pairs such as Hoogsteen base pairs.
Figure 1.2 Chemical structures of base pairs via Watson–Crick or Hoogsteen types.
1.4 Nucleic Acid Structures Including Non-Watson–Crick Base Pairs
Behind the extensive efforts to identify the duplex structure of Watson–Crick base pairs, Hoogsteen base pairs were also found in the structure of nucleic acids in the 1960s. Felsenfeld and Rich explained how poly(rU) strands might associate with poly(rA)-poly(rU) duplexes to form triplexes [8]. From the chemical shift of NMR, they identified evidence for triplex formation via protonated G–C+ Hoogsteen base pairs at cytosine N3 in a poly(dG)-poly(dC) complex with dGMP at low pH [9]. In 1962, it was found that short guanine-rich stretches of DNA could assume unusual structures [10]. The diffraction studies of poly(guanylic acid) gels suggested that if four guanines were close enough together, they could form planar hydrogen-bonded arrangements now called guanine quartets (G-quartets). With a stack of a few G-quartets, a tetraplex structure is formed called as G-quadruplex (see Chapter 2). In the crystal structure, Hoogsteen base pairs of polynucleic acids were first found in tRNA structure [11]. In the structure Watson–Crick base pairs formed the secondary structure of tRNA, whereas Hoogsteen base pairs supported the tertiary structure. Not only Hoogsteen base pairs but also other types of non-Watson–Crick base pairs were found in tRNA structures. The tertiary structure of nucleic acids is important especially for non-coding RNAs, which do not code genetic information. The landmark of research of non-coding RNA is the discovery of ribozyme (ribonucleic acid enzyme) by Thomas Robert Cech in 1982 [12]. Ribozymes catalyze chemical reactions as well as protein enzymes. Later structural studies revealed that there are a lot of non-Watson–Crick base pairs to produce the active core of enzymatic reaction of ribozymes. Therefore, non-canonical Watson–Crick base pairs including Hoogsteen base pairs have been thought of as a tool for the tertiary structure of nucleic acids except for duplexes.
With the progress of structural analysis technology in the 1990s, Hoogsteen base pairs are gradually revealed to exist in DNA complexes with low molecular weight compounds and proteins as well as transiently in Watson–Crick-type double helix. Furthermore, another type of tetraplex structures was identified from DNA sequence enriched in cytosine due to the cross intercalations of hemiprotonated cytosine–cytosine (C–C+) base pairs under acidic conditions [13]. This tetraplex is called as i-motif (see Chapter 2). Soon after, the roles of the non-canonical structures have been gaining attention. Especially since the 2000s, research on the G-quadruplex structure formed from Hoogsteen base pairs has made remarkable progress. When a G-quadruplex is formed on DNA or RNA, the reactivity of the protein involved in gene expression is affected (see Chapters 6–8). This means that the central dogma proposed by Crick – that genetic information is determined centrally by the flow of replication, transcription, and translation – is highly controlled by the formation of a G-quadruplex structure. In general, it has been thought that the regulation of gene information expression is due to protein functions. However, the specific structure of Hoogsteen base pairs controls gene expression so that the nucleic acid itself can function like a protein. That is, the roles of nucleic acids might be properly used according to base pairs: Watson Crick base pair = information, non-Watson Crick base pair = function. Many sequences that can have a G-quadruplex structure are distributed in the telomere at the end of the chromosome and the promoter region of the oncogene of the gene. Starting with the 2013 report, there have been many reports on the formation of G-quadruplexes and i-motifs in cells. These reports point out that the oncogene may be activated by the formation (or dissociation) of the G-quadruplex to cause cancer (see Chapter 11). Furthermore, it has been suggested that the phase-separated structure formed by the aggregation of RNAs with G-quadruplexes contributes to neurological diseases such as amyotrophic lateral sclerosis (see Chapter 12).
1.5 Perspective of the Research for Non-canonical Nucleic Acid Structures
As the regulation of gene expression by the specific structure of nucleic acids has been clarified, the next important issue is knowing what specific structures are formed where and when in cells. For example, Hoogsteen base pairs are affected by the molecular environments such as ions, pH, and water activity. Cells are in an environment crowded with molecules, so-called molecular crowding (see Chapters 3 and 4), and the molecular environment changes depending on the cell cycle [14]. For example, the nucleolus causes a change in the molecular density in the nucleus by repeating formation and dissociation according to the cell cycle. This regulates the timing of activation of rRNA transcription in each cell cycle, because the transcription of rRNA specifically occurs in nucleolus. In addition, the environment of mitochondria is particularly crowded (up to 500 mg ml−1) but heterogeneous due to locally increased proton concentration by the proton gradient required for ATP synthesis. Therefore, it is desirable to develop a technology that can predict physicochemical property of specific structures due to Hoogsteen base pairs in each characteristic molecular environment [15]. In addition, there is a possibility to make a new approach of drug development that treats diseases by changing the molecular environments of cells, rather than targeting genes and proteins.
1.6 Conclusion and Perspective
According to Pauling's personal communication revealed by the Nobel Foundation's disclosure, he considered Watson and Crick's Nobel award to be premature. In spite of his opinion, the Nobel Foundation decided to award the Prize to Watson and Crick. This might suggest that Watson–Crick base pairs were very common and meaningful at that time but non-Watson–Crick base pairs were artifact and meaningless. Nowadays, non-Watson–Crick base pairs are becoming common and significant as Pauling perhaps predicted. Now, the day when the essence of nucleic acids becomes beyond the concept of Watson and Crick is coming closer.
References
1 1 Schrödinger, E. (1944). What Is Life? The Physical Aspect of the Living Cell and Mind. Cambridge: Cambridge University Press.
2 2 Tamm, C., Hodes, M., and Chargaff, E. (1952). J. Biol. Chem. 195: 49–63.
3 3 Watson, J.D. and Crick, F.H. (1953). Nature 171: 737–738.
4 4 Pauling, L. and Corey, R.B. (1953). Nature 171: 346–346.
5 5 (a) Hoogsteen, K. (1959). Acta Crystallogr. 12: 822–823.(b) Hoogsteen, K.R. (1963). Acta Crystallogr. 16: 907–916.
6 6 (a) Day, R.O.,