(17k). These categories amount to ∼359k DNA/RNA sequences of different assembly levels of readiness, of which 341k sequence samples of assembly level “complete” were used to calculate the averages presented here. Thus, filters were used to obtain a clean data set. For instance, only levels for “complete chromosomes” or “complete genomes” were considered for these calculations.
Moreover, the maximum values presented in the main text were extracted from these data and checked against the literature. The files containing the raw data can be found in the additional materials online. Important note: The number of samples shown on the last row of Table 1.4 can be misleading. Table 1.4 shows 252k prokaryote samples, whereas the cataloged prokaryotes in Table 1.1 show a total of 12k species. In the NCBI database, prokaryotes have more than one reference or representative genome per species. According to NCBI filters, around 3.2k of the prokaryote genomes are representative.
Figure 2.1 The average genome size. (a) Shows the proportion of known species in each kingdom of life. (b) It shows the tree of life with data on the main kingdoms of life. Each kingdom is labeled with the average genome size and the average GC% content. (c) Shows the average organellar genome for a number of organelles investigated to date. Here, the organelles are sorted by GC%. (d) It shows a comparison between mitochondria and chloroplasts. (e) Shows a comparison between plasmids from bacteria, archaea, and eukaryotes. For each chart (c–e), the left axis indicates the GC% percentage and the right axis indicates the average size of the genome expressed in mega base pairs (written here as Mb instead of Mbp, for ease).
Table 2.1 The average genome size in the tree of life.
Genome size average (Mb) | |||||
---|---|---|---|---|---|
Eukaryotes (Mb) | Prokaryotes (Mb) | Plasmids (Mb) | Organelles (Mb) | Viruses (Mb) | |
AV | 433.92 | 3.74 | 0.11 | 0.07 | 0.04 |
SD | ±1160.87 | ±1.81 | ±0.23 | ±0.39 | ±0.43 |
Average GC% content | |||||
Eukaryotes (%) | Prokaryotes (%) | Plasmids (%) | Organelles (%) | Viruses (%) | |
AV | 41.92 | 48.72 | 45.91 | 36.05 | 45.34 |
SD | ±10.90 | ±11.87 | ±11.32 | ±7.92 | ±9.27 |
Samples | |||||
Total | 12 039 | 252 029 | 21 801 | 16 388 | 38 431 |
The table shows the average genome size and the average GC% content in: Eukaryotes, prokaryotes, plasmids, organelles, and viruses (eukaryotic and prokaryotic). Note that smaller standard deviation (SD) values indicate that more of the data are clustered about the mean while a larger SD value indicates the data are more spread out (larger variation in the data). The unit of length for DNA is shown in mega bases (Mb). For instance, DNA fragments equal to 1 million nucleotides (1 000 000 b) are 1 mega base in length (1 Mb) or 1000 kilo bases (1000 kb) in length. The last row (samples) indicates how many sequenced genomes have been used for these computations.
2.3.4 Observations on Data
Eukaryotic organisms show an average genome size of 434 Mb and prokaryotic organisms show an average genome size of 3.7 Mb. DNA-containing organelles (70 kb) and viruses (40 kb) show mildly close values for the average genome size. On the other hand, plasmids (110 kb) contain almost twice as much genetic material when compared to the average genome size of organelles and viruses. Out of curiosity, a calculation can be made here on the reductive evolution of organelles. Considering the ancestry of the organelles, the average genome of prokaryotes (3.74 Mb) was taken as the reference system in this approach:
where average prokaryote genome (ARG) is the average size of the reference genome and average organellar genome (AOG) is the average size of the organelle genome. AOG% represents the size of AOG expressed as a percentage from ARG. The AOG represents 1.8% from the ARG. Thus, the reductive evolution is then represented by the reference (the average genome of prokaryotes) percentage value of 100% minus 1.8%:
where REv represents the reduction of the AOG since first endosymbiosis occurred (2 billion – 1.5 billion years ago). Thus, during this period, the AOG underwent a reductive evolution of 98%. Note that genomes fluctuate in size over long periods of time and the reductive evolution is not necessarily a “one-way street” [181]. The average GC% content was also calculated. The average GC% shows a fairly large difference between prokaryotes and eukaryotes. Plasmids and viruses show a close GC% average of ∼ 45% (Table 2.1).
This observation is not entirely surprising since plasmid-to-virus transition scenarios have been proposed in the past [182]. Prokaryotic and eukaryotic ssDNA viruses have their origin in bacterial and archaeal plasmids [183]. Plasmid propagation by virus-like particles was observed in the saline waters of cold environments. For instance, a plasmid from an Antarctic haloarchaeon uses specialized membrane vesicles to disseminate and infect plasmid-free cells [184]. The average GC% of organellar genomes is somewhat close to the average GC% value observed in the eukaryotic genomes, but very far from the GC% average value observed in the prokaryotic genomes. In prokaryotes, the average genome size and the average GC% content was also calculated separately for bacteria and archaea (Table