Saccharomyces cerevisiae (the budding yeast) is an ideal example for this hypothesis [158]. Yeast colonies growing on solid media show specific structural patterns in their three-dimensional multicellular organization. These structural patterns are specific to each yeast strain [159]. Moreover, variations in the multicellular organization appear to be dependent on the environmental parameters, such as the position of surrounding cells, nutrient gradients, temperature, and so on [160].
1.13.4 Chimerism and Mosaicism
The cooperation of eukaryotic cells is best observed in the case of two phenomena, namely chimerism and mosaicism. Mosaicism is represented by two or more cell populations in different tissues originating from one fertilized egg. Namely, one cell population with the original genotype (usually representing the majority) and other cell populations with slight variations of the original genotype. One of the mechanisms that lead to mosaicism is represented by transposomes [161]. With embryonic development, the genotype of an organism can undergo various types of mutations, including transposome-induced mutations above different cell lines. These mutations can occur late in embriogenesis, leading to marginal effects at the organism level, or they can occur early in the embryonic development of an organism with more pronounced/noticeable effects [162]. Transposome-induced mutations represent a normal and nonrandom variability in multicellular organisms, leading to different phenotypic characteristics [161]. The classic example, however, is represented by the experiment performed by Barbara McClintock on corn kernels, where the transposomes inactivate the gene for the pigment protein and the phenotype is easily recognizable (please see the “horizontal gene transfer” subchapter from above). Mosaicism can also be represented by other types of mutations. For example, in humans, Down syndrome is characterized by an additional copy of chromosome 21, which is frequently attached to chromosome 14 (Trisomy 21). The extra copy of chromosome 21 slightly changes the chromosomal territories in the cell nucleus and the way heterochromatin and euchromatin are distributed. This leads to unusual variations in the expression of certain genes, especially those present on the extra chromosome 21 and those present on the neighboring chromosomal territories. Trisomy 21 occurs at the beginning of embryogenesis. However, such a mutation may appear late in embryogenesis, which results in mosaic tissues, part with normal cells and part with cells with an extra copy of chromosome 21 [163, 164]. Such a mosaic can be clinically unnoticeable unless genetic analysis is made on different tissues of the organism. Moreover, it is believed that a combination of germinal and somatic Trisomy 21 mosaicism may be reasonably common in the general population [163]. In development, cells with different genotypes compete in the tissue population. Such a competition can lead to the possibility (especially for mosaicism that appeared late in embryogenesis) in which cells with the original genotype completely marginalize the function of mosaic genotypes or vice versa, depending on which is more fit for a specific function. On the other hand, chimerism is represented by fusion of more than one fertilized zygote, namely cells of different organisms that are orchestrated by common molecular signals to form a single body [165]. Chimerism can be observed in all multicellular species to a greater or lesser extent and may be accompanied by genetic mosaicism in any of the genotypes that form the composite organism.
1.14 Conclusions
Biological literature is probably the most sophisticated among all sciences and can be particularly overwhelming. An introduction was made to some important concepts that can provide an overview on living organisms, such as the emergence of life, classification, number of species, the origins of eukaryotic cells, the endosymbiosis theory, organelles, reductive evolution, the importance of HGT, and the main hypotheses regarding the origin of eukaryotic multicellularity. Among the biological concepts described here, some have wider implications. Examples of genome-less organelles, such as hydrogenosomes, or processes such as the HGT, question life as we understand it. Endosymbionts best explain the significance of the environment and also explain the distribution of life in a blurry, nonunitary context. In other words, endosymbiosis widens the threshold of life and shows how difficult it is to place a border between how much life resides inside or outside the cell. Moreover, the HGT appears to connect all the species on earth to a greater or lesser extent. Much evidence shows that some of these ancient processes (e.g. catalytic RNAs) are likely adding or subtracting innovative mechanisms for continuous adaptations among different species (if not all).
2 Tree of Life: Genomes (II)
2.1 Introduction
An insight into the context of biological information is of utmost importance for different approaches in bioinformatics. The first part of the chapter discusses the units of measurement and explains the meaning of some notations used here. A few interesting unit conversions, with accompanying algorithms, are shown in addition to the subject. Next, eukaryotic and prokaryotic organisms with the largest/smallest genomes are presented in detail. Moreover, different computations performed for this chapter show the average genome size above the major kingdoms of life, including the average genome size of different organelles, plasmids, and viruses. Toward the end of the chapter, a comparative analysis is made between the average number of genes and the average number of proteins above the main kingdoms of life. This informative analysis highlights the frequency of a process called alternative splicing, which allows certain eukaryotic genes to encode for several types of proteins.
2.2 Rules of Engagement
Genome size refers to the amount of DNA contained in a haploid genome (a single set of chromosomes). The genome size is expressed in terms of base pairs (bp) and the related transformations: kilo base pairs (1 kbp = 1000 bp), or mega base pairs (1 Mbp = 1 000 000 bp), or giga base pairs (1000 Mbp = 1 Gbp), and so on. By excellence, base pairs are discrete units. Nonetheless, these units of measurement are also used to express averages. For single-stranded DNA (ssDNA)/RNA sequences, the unit of measurement is the nucleotide (nt) and is written as: 1 000 000 nt, 1000 Knt, 1 Mnt, 0.001 Gnt, and so on. However, most often than not, base pairs are written as simple bases when the context is understood (e.g. 1 000 000 b, 1000 kb, 1 Mb, 0.001 Gb, and so on). For instance, the notations “b,” “kb,” “Mb,” “Gb” are used when referring to DNA/RNA sequences in text format. FASTA files contain nucleic acid sequences in the 5′–3′ direction. Technically, all nucleic acids represented as FASTA are single-stranded; however, through complementarity, the reference can be considered as double-stranded. In this chapter, the CG% content is mentioned as an intuitive parameter for the overall composition of the genomes of different species. Note that the (C+G)% or GC% content represents the percentage of guanine and cytosine along a DNA or RNA sequence (e.g. a DNA/RNA fragment, a gene, an entire genome).
2.3 Genome Sizes in the Tree of Life
There is no direct correlation between the genome size of a species and the complexity of its phenotype. In any case, the intellectual curiosity regarding the size of genomes still remains. Determination of genome size based on DNA sequencing data is one of the most accurate methods to date. To observe the lack of correlation between genome size and phenotype, upper-bound extremes can be considered here. As expected in an intuitive manner, eukaryotes show the largest genomes. In animals, the amphibian Ambystoma mexicanum (the Mexican Axolotl) shows the largest (sequenced) genome observed in nature to date. A. mexicanum shows a genome size of 32 396 Mbp (32 Gb) and a physical length that can reach up to 30 cm [166]. In plants, the record is held by Pinus lambertiana (27 603 Mbp) and Sequoia sempervirens (26 537 Mbp). P. lambertiana is the tallest and most massive pine tree [167, 168]. S. sempervirens species includes the tallest living trees on Earth (115.5 m in height or 379 ft) [169]. Among the prokaryotes, Minicystis rosea and Sorangium cellulosum So0157-2 show the largest genomes. The bacterial genome of M. rosea contains 16 Mbp of DNA (GC%: 69.1) and shows the maximum genome size found in prokaryotes [170]. Secondary to this species is the bacterial genome of S. cellulosum So0157-2, with 14.78 Mbp of DNA (GC%: 72.1) [171]. As discussed in the previous chapter, endosymbiosis challenges the notion of the smallest genome necessary for life. The smallest prokaryotic genomes were found in different