Paul A. Gagniuc

Algorithms in Bioinformatics


Скачать книгу

(17k). These categories amount to ∼359k DNA/RNA sequences of different assembly levels of readiness, of which 341k sequence samples of assembly level “complete” were used to calculate the averages presented here. Thus, filters were used to obtain a clean data set. For instance, only levels for “complete chromosomes” or “complete genomes” were considered for these calculations.

Schematic illustration of the average genome size. (a) Shows the proportion of known species in each kingdom of life. (b) It shows the tree of life with data on the main kingdoms of life. Each kingdom is labeled with the average genome size and the average GC% content. (c) Shows the average organellar genome for a number of organelles investigated to date. Here, the organelles are sorted by GC%. (d) It shows a comparison between mitochondria and chloroplasts. (e) Shows a comparison between plasmids from bacteria, archaea, and eukaryotes. For each chart (c–e), the left axis indicates the GC% percentage and the right axis indicates the average size of the genome expressed in mega base pairs.
Genome size average (Mb)
Eukaryotes (Mb) Prokaryotes (Mb) Plasmids (Mb) Organelles (Mb) Viruses (Mb)
AV 433.92 3.74 0.11 0.07 0.04
SD ±1160.87 ±1.81 ±0.23 ±0.39 ±0.43
Average GC% content
Eukaryotes (%) Prokaryotes (%) Plasmids (%) Organelles (%) Viruses (%)
AV 41.92 48.72 45.91 36.05 45.34
SD ±10.90 ±11.87 ±11.32 ±7.92 ±9.27
Samples
Total 12 039 252 029 21 801 16 388 38 431

      The table shows the average genome size and the average GC% content in: Eukaryotes, prokaryotes, plasmids, organelles, and viruses (eukaryotic and prokaryotic). Note that smaller standard deviation (SD) values indicate that more of the data are clustered about the mean while a larger SD value indicates the data are more spread out (larger variation in the data). The unit of length for DNA is shown in mega bases (Mb). For instance, DNA fragments equal to 1 million nucleotides (1 000 000 b) are 1 mega base in length (1 Mb) or 1000 kilo bases (1000 kb) in length. The last row (samples) indicates how many sequenced genomes have been used for these computations.

      2.3.4 Observations on Data

      Eukaryotic organisms show an average genome size of 434 Mb and prokaryotic organisms show an average genome size of 3.7 Mb. DNA-containing organelles (70 kb) and viruses (40 kb) show mildly close values for the average genome size. On the other hand, plasmids (110 kb) contain almost twice as much genetic material when compared to the average genome size of organelles and viruses. Out of curiosity, a calculation can be made here on the reductive evolution of organelles. Considering the ancestry of the organelles, the average genome of prokaryotes (3.74 Mb) was taken as the reference system in this approach:

equation equation

      where REv represents the reduction of the AOG since first endosymbiosis occurred (2 billion – 1.5 billion years ago). Thus, during this period, the AOG underwent a reductive evolution of 98%. Note that genomes fluctuate in size over long periods of time and the reductive evolution is not necessarily a “one-way street” [181]. The average GC% content was also calculated. The average GC% shows a fairly large difference between prokaryotes and eukaryotes. Plasmids and viruses show a close GC% average of ∼ 45% (Table 2.1).