proteins that bind to the regulatory regions of DNA (the promoter and enhancer regions). In eukaryotes, the regulatory proteins facilitate changes in the local chromatin structure to allow proper recruitment and binding of RNA polymerase to one of the DNA strands. Thus, the local chromatin structure either promotes or inhibits RNA polymerase and TF binding. Transcription begins once the RNA polymerase enzyme binds to the promoter region of the gene. Regulatory proteins in conjunction with different combinations of TF dictate the frequency of synthesis for pre-messenger RNA (mRNA) molecules (how many copies per unit of time). For instance, different combinations of TFs lead to different three-dimensional macromolecular conformations (the transcription mediator complex) [42]. These temporary macromolecular constructions (made of TFs and other proteins) and their interaction with chromatin, allow the access of RNA polymerase to the DNA sequence to a greater or lesser extent. The difficulty of recruitment imposes a probability distribution for binding. In turn, this binding probability of RNA polymerase sets the frequency of synthesis for pre-mRNAs. As a rule of thumb, a more open chromatin structure is associated with active gene transcription events, while a more compact chromatin structure indicates transcriptional inactivity (no expression).
1.5.2 Precursor Messenger RNA to Messenger RNA
The DNA region of the gene can be subdivided into other types of regions. Especially in eukaryotes, many genes are organized into coding (exons) and noncoding regions (spliceosomal introns). Both exons and introns are transcribed into a continuous pre-mRNA fragment. While the pre-mRNA is being transcribed/synthesized, the intronic regions are removed by spliceosomes (a ribonucleoprotein complex) and the ligation of the exon regions forms the mRNA. The process of intron removal is called splicing. Exon ligation in the same order, in which these regions appear in a gene, is called constitutive splicing. Thus, constitutive splicing allows for a “one gene–one protein” model (or, one pre-mRNA – one mRNA). When exon ligation does not follow the order observed in the gene (i.e. certain exons are skipped), several mRNA variants are produced from a single pre-mRNA variant. If the gene encodes for proteins, then each mRNA variant will generate a different type of protein (protein isoforms). This process is known as alternative splicing, and is responsible for the exaggerated abundance of protein types in the eukaryotic proteome.
1.5.3 Classes of Introns
Introns are regions that interrupt the coding region of functional RNA or protein-coding genes. There are four known classes of introns: Group I introns, Group II Introns, nuclear pre-mRNA introns (Spliceosomal introns), and transfer RNA (tRNA) introns. Group I introns are self-splicing introns and are found in some ribosomal RNA (rRNA) genes [43]. Group II introns are mobile ribozymes that self-splice from precursor RNAs (pre-RNAs) and are found in bacterial genomes and organellar genomes, suggesting that catalytic RNAs, as informational structures, predate the origin of eukaryotes and perhaps the origin of cellular life [44, 45]. Nuclear pre-mRNA introns are found in protein-coding genes and require a ribonucleoprotein complex (spliceosomes) for splicing. The tRNA introns are found in various tRNA genes in all the three kingdoms of life, and require certain enzymes for splicing [46].
1.5.4 Messenger RNA
The stochastic behavior of brownian motion (random walk) and the concentration gradients, scatter the mRNA molecules from the origin of synthesis to other locations with a lower concentration. Thus, the mRNA scattering (as is the case with many other molecules of different sizes and shapes) is done naturally on the least frictional paths. In the case of eukaryotes, the origin of pre-mRNA synthesis and processing is the inner space of the cell nucleus (high mRNA concentration) and the mRNA molecules diffuse through the nuclear pores into the relaxed environment of the cytoplasm (low mRNA concentration). In the case of prokaryotes, the mRNAs diffuse from the origin of synthesis (which is close to the DNA molecule floating directly in the cytoplasm) into the rest of the cytoplasm. The information from some mRNAs allows for protein synthesis, whereas the information from other RNAs provides direct biological functionality. Moreover, some mRNA molecules become functional by themselves due to a self-complementary between different regions of the same molecule. Other RNA molecules become functional after they are processed by different proteins (or vice versa).
1.5.5 mRNA to Proteins
In both eukaryotes and prokaryotes, mRNA molecules, which contain the information structure for protein synthesis, are stochastically encountered by two ribosomal subunits that initiate the translation step. Once bound to an mRNA transcript, the two subunits form the ribosome. The ribosome is a ribonucleoprotein (made of RNA and proteins) organelle that facilitates the formation of chemical bonds between amino acids in the order specified by the information encoded in the mRNA molecule. Life evolved a molecular scheme for translation, known as the “genetic code” [47]. In this scheme, groups of three nucleotides are associated with different amino acids used for polypeptide synthesis. Each set of consecutive and nonoverlapping nucleotide triplets on the mRNA transcript is known as a codon. Polypeptide synthesis begins from a start codon, which initiates the position of the reading frame. Usually, the start codon is represented by the “AUG” triplet (representation with the highest frequency across all life). However, other triplet combinations (non-AUG start codons) can take the role of a start codon (with a lower frequency) [48]. Post initialization, the mRNA transcript slides in between the two ribosomal subunits by one codon at a time following the reading frame set by the start codon [49, 50]. Different versions of tRNAs present in various concentrations in the cytoplasm are each linked to an amino acid. The type of amino acid connected to a tRNA is associated with an anticodon, a special nucleotide triplet region from the tRNA destined for a temporary bind to an mRNA transcript. Thus, tRNAs are the temporary links between the mRNA transcript and the nascent amino acid chain. An assembled ribosome contains three “openings” (A, P, and E sites) for tRNA–mRNA interactions (Figure 1.3.b). The smaller subunit of the ribosome allows for a complementary between three nucleotides (the codon) on the mRNA transcript and three nucleotides (anticodon) of a tRNA molecule (Figure 1.3.b). Once the mRNA–tRNA binding has been facilitated by the smaller subunit, the amino acid transfer from a tRNA to the nascent amino acid chain is facilitated by the larger subunit of the ribosome [51]. The tRNA molecules with appropriate anticodons come into contact through complementary with the mRNA transcript.
The amino acid chain is passed from the previous tRNA to the amino acid of the next incoming tRNA, increasing the growing peptide by one amino acid on each switch. Thus, the amino acid chain remains attached to the most recently bound tRNA and is not released until a termination codon appears in the mRNA transcript (UAA, UAG, UGA) [56]. Since it is an evolved/evolving scheme, small variations of the genetic code exist above different kingdoms of life, and these variations are central to the ultimate goals of bioinformatics (i.e. how life works).
1.5.6 Transfer RNA
On the other side of the translation, an ancient group of enzymes set the rules of the genetic code [57]. The aminoacyl–tRNA synthetase (tRNA-ligase) represents a group of enzymes. The function of these enzymes is to attach an appropriate amino acid to a corresponding tRNA (Figure 1.3.c). Many of these enzymes recognize their tRNA molecules using the anticodon [58]. Consequently, there is one tRNA-ligase for each tRNA–amino acid pair. For instance, in humans there are twenty different types of aminoacyl–tRNA synthetases, one for each amino acid of the genetic code [59]. Some organisms lack the genes needed for all twenty aminoacyl–tRNA synthetases. However, such organisms use all twenty amino acids for protein synthesis. In such cases, a tradeoff is made in the complexity of a tRNA-ligase, such that one enzyme associates more than one pair [60, 61]. Thus, the tRNA matching with an amino acid is based on additional properties exhibited by the tRNA, such as the geometry (shape) of the molecule, specific nucleotide positions along the tRNA chain, and so on [62].
1.5.7 Small RNA
RNAs