Why does Hardy–Weinberg work?
A proof of Hardy–Weinberg.
Hardy–Weinberg with more than two alleles.
The Hardy–Weinberg equation is one of the most basic expectations we have in population genetics. It is very likely that you were already familiar with the Hardy–Weinberg equation before you picked up this book. But where does Hardy–Weinberg actually come from? What is the logic behind it? Let's develop a simple proof that Hardy–Weinberg is actually true. This will also be our first real foray into the type of the algebraic argument that much of population genetics in built on. Given that you start out knowing the conclusion of the Hardy–Weinberg tale, this gives you the opportunity to focus on the style in which it is told. Algebraic or quantitative arguments are a central part of the language and vocabulary of population genetics, so part of the task of learning population genetics is becoming accustomed to this mode of discourse.
We would like to prove that p2 + 2pq + q2 = 1 accurately predicts genotype frequencies given the values of allele frequencies. Let's start off by making some explicit assumptions to bound the problem. The assumptions, in no particular order, are:
1 mating is random (parents meet and mate according to their frequencies);
2 all parents have the same number of offspring (equivalent to no natural selection on fecundity);
3 all progeny are equally fit (equivalent to no natural selection on viability);
4 there is no mutation that could act to change an A to a or an a to A;
5 it is a single population that is very large;
6 there are two and only two mating types.
Now, let's define the variables we will need for a case with one locus that has two alleles (A and a).
N = Population size of individuals (N diploid individuals have 2N alleles)
Allele frequencies | |
p = frequency(A allele) = (total number of A alleles)/2N | |
q = frequency(a allele) = (total number of a alleles)/2N | |
p + q = 1 | |
Genotype frequencies | |
X = frequency(AA genotype) = (total number of AA genotypes)/N | |
Y = frequency(Aa genotype) = (total number of Aa genotypes)/N | |
Z = frequency(aa genotype) = (total number of aa genotypes)/N | |
X + Y + Z = 1 |
We do not distinguish between the heterozygotes Aa and aA and treat them as being equivalent genotypes. Therefore, we can express allele frequencies in terms of genotype frequencies by adding together the frequencies of A‐containing and a‐containing genotypes:
(2.2)
(2.3)
Each homozygote contains two alleles of the same type, while each heterozygote contains one allele of each type so the heterozygote genotypes are each weighted by half.
With the variables defined, we can then follow allele frequencies across one generation of reproduction. The first step is to calculate the probability that parents of any two particular genotypes will mate. Since mating is assumed to be random, the chance that two genotypes will mate is just the product of their individual frequencies. As shown in Figure 2.7, random mating can be thought of as being like gas atoms in a balloon. As with gas atoms, each genotype or gamete bumps into others at random, with the probability of a collision (or mating or union) being the product of the frequencies of the two objects colliding. To calculate the probabilities of mating among the three different genotypes, we can make a table to organize the resulting mating frequencies. This table will predict the mating frequencies among genotypes in the initial generation, which we will call generation t.
Figure 2.7 A schematic representation of random mating as a cloud of gas where the frequency of A's is 14/24 and the frequency of a's 10/24. Any given A has a frequency of 12/20 and will encounter another A with probability of 14/24 or an a with the probability of 10/24. This makes the frequency of an A‐A collision (14/24)2 and an A‐a or a‐A collision 2(14/24)(10/24), just as the probability of two independent events is the product of their individual probabilities. The population of A's and a's is assumed to be large enough so that taking one out of the cloud will make almost no change in the overall frequency of its type.
A parental mating frequency table (generation t) is shown below.
Moms | Frequency | Dads | ||||
AA | Aa | aa | ||||
X | Y | Z | ||||
AA | X | X 2 | XY | XZ | ||
Aa | Y | XY | Y 2 | YZ | ||
Aa | Z | ZX | ZY | Z 2 |
The table expresses parental mating frequencies in the currency of genotype frequencies. For example, we expect matings between AA moms and Aa dads to occur with a frequency of XY.
Next, we need to determine the frequency of each genotype in the offspring of any given parental mating pair. This will require that we predict the offspring genotypes resulting from each possible parental mating. We can do this easily with a Punnett square. We will use the frequencies of each parental mating (above) together with the frequencies of the offspring genotypes. Summed for all possible parental matings, this gives the frequency of offspring genotypes one generation later, or in generation t + 1. A table will help organize all the frequencies, like the offspring frequency table (generation t