Brenda A. Wilson

Bacterial Pathogenesis


Скачать книгу

Rev Microbiol 13(6):360–372, with permission.

      Assembling individual genome sequences from many thousands of sequences is still challenging for existing bioinformatics programs, but here again, rapid advances are being made in the analysis of the huge volume of new sequence data that is emerging, and these advances are beginning to make what seems unimaginable today feasible tomorrow (in biotechnology, as in most modern scientific endeavors, “impossible” just means it is “not possible yet”). Interpretation of a metagenomic analysis is much more complex than 16S rRNA analysis, or even a whole genome assembly and analysis, and details are beyond the scope of this textbook, but for the adventurous, several examples of recent metagenomic analyses of human microbiotas are provided in the suggested readings. What these analyses allow us to do is have a glance at the types of biosynthetic and metabolic pathways that the microbes in any given population might have at their disposal to utilize.

      To help tackle the daunting task of defining our microbiomes, the National Institutes of Health launched the Human Microbiome Project (HMP) in 2007 with the objective of sequencing the collective genomes of all of the microbes (bacteria, archaea, fungi, protozoa, and viruses) that comprise the microbiotas of five targeted sites in the human body: the skin, mouth, nasal passages/lungs, vagina, and gut. The goal of the HMP was to lay the groundwork for establishing what constitutes a healthy microbiota so as to understand how changes in microbial composition, in terms of diversity and richness of content, affect health and disease. A large part of the HMP was to establish a repository of reference microbial genome sequences that could be used for facilitating the interrogation of microbiomes.

      Metagenomic analysis starts with the census information indicating which species are present in microbial populations, but then adds information about metabolic function gleaned from thousands of complete bacterial genomes that are already deposited in genome databases, largely established through the HMP effort (see Box 5-1). These resources can provide a vast amount of information about the metabolic potential of a particular microbiome through comparison of the metabolic pathway sequences present in that mixed bacterial population with those from annotated genomes of taxonomically related bacteria that have been collected through the HMP. Since most of the already completed genomes were sequenced from a single clone of that microbe, most of the existing metabolic pathways and gene functions have already been ascertained, either directly from biochemical analyses or by inference based on analogy to similar genes in other microbes.

      A shared limitation of the 16S rRNA gene and metagenomic analysis approaches is that they do not provide information about which genes are being expressed at any given time or what the functional activity of the community is under any particular condition. Since many bacteria that normally reside in or on the body may be expelled into the environment after traveling through the stomach and intestinal tract from their site of origin, it stands to reason that the site they normally occupy does not explain the presence of all of their genes. They have to endure stresses independent of the environment of the site in which they usually reside. Thus, only a subset of genes is likely to be expressed at any one time or at any particular site. Moreover, even within the same site, changes in conditions, such as changes in diet or hormonal levels, may cause increases or decreases in expression of certain sets of genes. Work is now underway to include additional information about the functional status of the microbial community by combining other types of functional information about gene, RNA, and protein expression levels, activity status, and flux of metabolite content.

      RNA-Seq Profiling (Transcriptomics). Gene expression in complex populations can be measured using techniques that detect and quantitate mRNA levels, including qPCR (see Figure 5-5) and RNA-seq technology (Figure 5-12). The Illumina RNA-seq paired-end sequencing platform has become the method of choice for interrogating the abundance and diversity of RNA transcripts (transcriptomics). The first step of RNA-seq is to extract and purify total RNA from the bacterial samples that will be compared. Different RNA purification methods are used for mRNA and small RNAs (sRNA) of less than 100 nucleotides. Highly abundant rRNAs and tRNAs are removed, and the mRNA samples are physically fragmented into smaller pieces. Reverse transcription is performed to convert the RNA fragments into cDNA, and the resulting cDNA is ligated to adapters, where barcode sequences mark cDNAs from different RNA samples. The cDNA + adapters are then subjected to the library amplification and Illumina sequencing, as illustrated in Figure 5-6.

      Figure 5-12. RNA-seq technology. The Illumina RNA-seq platform enables global profiling of the transcriptional responses of all genes in individual cells or tissues at considerable depth of coverage under multiple conditions or over time. (A) Steps in an RNA-seq experiment using Illumina sequencing described in the text and Figure 5-6. (B) Outcomes of aligning multiple separate, short (150–300 nucleotides) sequencing reads, which correspond to overlapping segments of mRNA or sRNA. Left, if a reference genome is not available for a bacterium, the separate overlapping reads can be aligned to show the length of a transcript. The transcript can then be analyzed for open reading frames (ORFs) or other features. Middle, if a reference genome is available, the aligned fragments can be compared to genomic features, such as ORFs, intercistronic regions, or predicted promoters and transcription terminators. Right, the number of reads for each nucleotide base in a transcript is proportional to the starting amount of mRNA or sRNA present in the sample. Hence, following normalization, relative transcript amounts can be determined in different bacterial strains or samples, such as in a wild-type bacterial strain compared to a mutant strain or a wild-type strain grown under an unstressed versus a stressed condition. See the text for additional details. (C) The number of nucleotide base reads (blue and red regions) is often constant across regions that are cotranscribed. Therefore, the read patterns across genes indicate monocistronic versus multicistronic operons, and also the presence of any sRNA, putative regulatory RNAs, or antisense RNAs. See the text for additional details. (D) An example of changes in relative transcript amounts determined by RNA-seq. An Escherichia coli strain was grown in medium lacking or containing a glucose analogue, α-methylglucoside (αMG). RNA-seq analysis showed that the relative transcript amounts of seven metabolic genes increased (left) and three metabolic genes decreased (right) in the bacteria treated with αMG compared to the untreated control. Independent quantitative reverse-transcription polymerase chain reaction (qRT-PCR) experiments confirmed the trend in expression changes detected by RNA-seq. Panel D adapted from McClure R, Balasubramanian D, Sun Y, Bobrovskyy M, Sumby P, Genco CA, Vanderpool CK, Tjaden B. 2013. Nucleic Acids Res 41(14):e140, with permission.

      The output of the Illumina sequencing is millions of short-sequence reads of about 150–300 nucleotides that correspond to regions of transcribed mRNA and sRNA molecules in the original samples. Multiple samples can be sequenced simultaneously in each Illumina sequencing run, and sequence reads can be sorted by the unique barcode sequences used in the adapters ligated to cDNAs of each sample. Because mRNA is randomly fragmented in this procedure, a series of overlapping reads covering the length of each mRNA emerges, and each nucleotide base in the mRNA is determined multiple times in separate reads. The total number of reads of each nucleotide base is referred to as coverage, and for most mRNA molecules the coverage of each nucleotide base is approximately equal. However, keep in mind that the output of a single RNA-seq experiment is all of the separate mRNAs or sRNAs expressed in a bacterium.

      Because of the immense volume of data acquired, computer bioinformatic analyses are required to align and display the separate mRNAs and sRNAs from RNA-seq experiments. From the separate reads, three kinds of analyses are possible (Figure 5-12B). If a genomic DNA reference sequence is unknown, the overlapping reads can be aligned to indicate the length and sequence of individual mRNA molecules, which may contain open reading frames (Figure 5-12B, left). If a reference genomic DNA sequence is