Группа авторов

Genotyping by Sequencing for Crop Improvement


Скачать книгу

used to screen for allele diversity using PCR from ten genotypes and the amplicons were sequenced followed by sequence comparison to identify SNP. SNPs were also identified through mining a large number of EST sequences in EST databases, which are generated through improved sequencing technologies (Soleimani et al. 2003). These SNPs are further validated using PCR (Batley et al. 2003). These approaches allowed the identification of mainly gene‐based SNPs, but their frequency is generally low. Additionally, SNPs located in low‐copy noncoding regions and intergenic spaces could not be identified.

      Several assays have been developed for genotyping based on identified SNPs which include, allele‐specific hybridization, primer extension, oligonucleotide ligation, and invasive cleavage (Sobrino et al. 2005). Besides, DNA chips, allele‐specific PCR, and primer extension were also attractive options since these are suitable for automation and can be used for the development of dense genetic maps. Allele‐specific hybridization was used for the identification of polymorphism in 570 genotypes of soybean (Coryell et al. 1999).

      1.5.1 Genotyping‐by‐Sequencing (GBS)

Schematic illustration of an example of GBS and GBS data analysis workflow for identification of SNP markers.

      GBS is being widely used to capture SNPs and other marker variations by NGS. GBS overtook the conventional genotyping procedures involving the use of traditional markers such as RAPD, AFLP, SSR, and many others in terms of time, labor, and cost involved. As an example, GBS can generate data of thousands of markers in a large population in a week, which can be analyzed in a month (Bhatia et al. 2018). The approach has been utilized in the mapping of several economically important traits in a number of crop plants (Poland and Rife 2012). Most of the developing countries have in‐house computational facilities that are being used for GBS analysis. Few online servers are also available, where GBS analysis can be done using in‐built pipelines such as cyverse (www.cyverse.org); however, these are unable to analyze the large dataset. Further speed of analysis depends upon the internet speed. Alignment of NGS‐based reads and calling SNPs and Indels are the two major steps in GBS analysis, for which several pipelines are available publically such as Stacks, IGST, GB‐eaSY, TASSEL‐GBS, FAST‐GBS, UNEAK, etc. (Wickland et al. 2017).

      Another important pipeline widely used for NGS data analysis is dDocent pipeline (www.dDocent.com) which is a simple bash wrapper to quality analysis, assemble, map, and call SNPs from almost any kind of RAD sequencing (Puritz et al. 2014). However, most of these pipelines are hard to code for a student with little bioinformatics background. Most of these pipelines vary with respect to the complexity of the genome and computational space required. Besides there are several bioinformatics tools such as BWA, Bowtie2, SAM tools, GATK, BCFtools including a set of Perl utility scripts (Kagale et al. 2016) that can be used for GBS data analysis. However, there should be knowledge of the installation and usage of these tools for proper utilization in data analysis. With the advancements in NGS approaches, GBS has become a widely used approach in plant breeding and genetics, particularly for understanding complex quantitative traits.

      DArT‐seq GBS (https://www.diversityarrays.com/technology‐and‐resources/dartseq/) somehow overcomes the limitation of the missing data point. The technique is an extension of traditional DArT technology where DArT representations are sequenced on the NGS platform. The fragment sequencing enables a dramatic increase in the number of genomic fragments analyzed and an increase in the number of reported markers thus making it a cost‐effective technology than the initial DArT method.

      1.5.2 Whole‐Genome Resequencing (WGR)

      1.5.3 SNP Arrays