41.0
The table shows a comparison between the average genome size, the average GC% content, the average number of genes, and the average number of resulting proteins. Note that the unit of length for DNA is shown in mega bases (Mb). DNA fragments equal to 1 million nucleotides (1 000 000 b) are 1000 kilo bases in length (1000 kb) or 1 mega bases in length (1 Mb), or 0.001 giga bases in length (0.001 Gb). For instance, an average genome size of 1493.6 Mb is 1.4936 Gb (∼1.4 Gb).
The average animal proteome is 45% more diverse when compared to the average number of animal genes. By using the same formula from above, the average plant proteome is 16% more diverse than the average number of known genes. The fungi, protest, and prokaryote average proteome is moderately undersized. As before, the average number of genes divides unity (a value of 1) and the result is multiplied by the average number of proteins. However, since in this case the proteome is smaller than the genome, the final result is subtracted from 1 (unity), as follows:
where S in this case is the part of the proteome that should exist, assuming a “one gene – one protein” correspondence. Thus, the fungal average proteome indicates ∼10% fewer types of proteins than the average number of genes. This suggest that 10% of the fungal genome encodes for functional RNAs. The situation is similar for protists. In protists, about 5% of genes could encode for functional RNAs and the remaining 95% encodes for proteins. In the case of prokaryotes, about 93% of archaeal and bacterial genes encode for proteins and the remaining 7% could encode for functional RNAs.
Although informative, note that an undersized proteome does not rule out the possibility of alternative splicing or protein splicing in any of these kingdoms. Animals and plants show the most diverse proteomes, well above the average number of genes (Table 2.6). Individually, some species may show a particularly high proteome diversity compared to these averages. For instance, in plants, Triticum durum (macaroni wheat) contains ∼63k of genes and a proteome of ∼190k. Following the same reasoning as above, the proteome of T. durum is ∼197% more diverse when compared to the number of genes. In animals, a significant difference can also be found. Current NCBI data shows that the human genome contains ∼60k of genes (the list of annotated features includes protein-coding genes, noncoding genes, and pseudogenes) and a proteome of ∼120k (H. sapiens GRCh38.p13). The proteome of H. sapiens is ∼95% more diverse when compared to the number of genes. Note: However, when it comes to the human genome and the proteome, a discussion can be almost dangerous over time. In literature, the number of genes and proteins for H. sapiens can vary depending on different agreements or/and advances in bioinformatics [234–236]. But why all this uncertainty related to the number of genes or proteins? All genes are predicted by using bioinformatic means. Many predictions are then verified by alignment of sequenced mRNAs against a reference genome. However, many genes express themselves only in special conditions or over certain periods of time, or only once in a life time. Thus, their mRNAs cannot be detected and sequenced to further confirm the bioinformatic predictions. To add to this matter, many genes may overlap and often gene promoters can show bidirectional activity [237–239]. It stands to reason that such elusive genes are difficult to locate with certainty and other genes will prove difficult to detect in the future. Moreover, many results derived from large-scale experiments (e.g. genome studies) are directly under the umbrella of chaos theory. Small changes in the initial parameters of different algorithms can lead to huge variations in the final predictions. This has already been evident over time in the case of the human genome [234–236].
2.9 Conclusions
Bioinformatics is the field that will perhaps lead to a clearer understanding of both the origins and the current mechanisms of life. Here, an introduction provided the necessary context for a better understanding of different approaches used in the field of bioinformatics and, possibly, for new ideas “just waiting” to be implemented in the future. In a first stage, the chapter described the units of measurement used throughout the book, and presented a series of useful conversions, followed by discussions regarding organisms with the largest/smallest genomes. In a second stage, the average values related to the genome size in different kingdoms of life were calculated and discussed. In this context, the global features of viral genomes, plasmid DNA and various genome-containing organelles found in different eukaryotic organisms have been described in brief. Furthermore, viroids have been mentioned in connection with the properties shown by catalytic RNAs. Toward the end of the chapter, discussions were gradually switched from catalytic RNAs to RNA splicing. The frequency of RNA splicing was further pointed out by a comparison between the average number of genes and proteins in the main kingdoms of life.
Конец ознакомительного фрагмента.
Текст предоставлен ООО «ЛитРес».
Прочитайте эту книгу целиком, купив полную легальную версию на ЛитРес.
Безопасно оплатить книгу можно банковской картой Visa, MasterCard, Maestro, со счета мобильного телефона, с платежного терминала, в салоне МТС или Связной, через PayPal, WebMoney, Яндекс.Деньги, QIWI Кошелек, бонусными картами или другим удобным Вам способом.