<- chapter_17|Chapter 17^table_of_contents|Table of Contents^chapter_19|Chapter 19 ->

<typo fs:x-large> Chapter 18. The Hardy-Weinberg equilibrium</typo>

In earlier chapters, we have been carrying out genetic analysis using controlled crosses where the genotypes are known and we can breed organisms however we want. This approach works well when you are studying a model organism, such as Drosophila or mice. It doesn't work well when you are interested in the genetics of organisms that cannot be studied in this way. For the next several chapters, we will consider genetics from the point of view of populations. We will treat this subject entirely from the perspective of human population studies where population genetics is used to get the type of information that would ordinarily be obtained by breeding experiments in model organisms.

In the next several chapters, we will use a substantial amount of math. To avoid confusion with fractions, we will write genotypes as $A/a$ instead of $\frac{A}{a}$.

===== The Hardy Weinberg equilibrium =====

At the heart of population genetics is the concept of allele frequency. Consider a human gene with two alleles:$A$ and $a$. The frequency of $A$ is $f(A)$ ; the frequency of $a$ is $f(a)$. We define the following symbols:

<figure Fig1>
$$p = f(A)\\
q = f(a)$$
<caption>
Defining the symbols $p$ and $q$ as allele frequencies. This assumes that there are only two alleles of a gene in a population. In more genetically diverse populations, there may be more than two alleles for any given gene. 
</caption>
</figure>

$p$ and $q$ can be thought of as probabilities of %%selecting%% the given alleles by random sampling. For example, $p$ for a given population of humans is the probability of finding allele $A$ by selecting an individual from that population at random and then %%selecting%% one of their two alleles at random.

Since $p$ and $q$ are probabilities and in this example there are only two possible alleles, we know from probability theory that the probabilities of all possibilities must add up to one, or:

<figure Fig2>
$$p + q = 1$$
<caption>
The frequencies (probabilities) of all possibilities must add up to one. 
</caption>
</figure>

Correspondingly, there are three possible genotypes, and therefore three possible genotype frequencies:

<figure Fig3>
$$f(A/A) + f(A/a) + f(a/a) = 1$$
<caption>
Three possible genotypes with two alleles, and how we represent their genotype frequencies. All possible genotype frequencies must add up to one.
</caption>
</figure>


We usually can't measure allele frequencies directly. Rather, we can derive them from the frequencies of the different genotypes that are present in a population:

<figure Fig4>
$$p = f(A/A)+\frac{1}{2} f(A/a)\\
q = f(a/a)+\frac{1}{2} f(A/a)$$
<caption>
Derivation of allele frequencies based on genotype frequencies.
</caption>
</figure>

Here is an example: $M$ and $N$ are alleles of a gene that specifies different blood antigens. The alleles are codominant, so a simple blood test can distinguish the three possible genotypes of $M/M$, $M/N$, and $N/N$. A survey of the population reveals that 83% of the population has only the $M$ antigen, 1% of the population only has the $N$ antigen, and 16% of the population has both the $M$ and $N$ antigens. In other words:

<figure Fig5>
$$f(M/M) = 0.83, f(M/N) = 0.16, f(N/N) = 0.01$$
<caption>
Observed phenotype frequencies (and therefore genotype frequencies) of hypothetical blood antigens $M$ and $N$.
</caption>
</figure>

Based on this information, we can calculate $p$ and $q$:

<figure Fig6>
$$p = f(M) = 0.83 + \frac{1}{2}\cdot0.16 = 0.91\\
q = f(N) = 0.01 + \frac{1}{2}\cdot0.16 = 0.09$$
<caption>
Derivation of allele frequencies for $M$ and $N$ based on the genotype frequencies given in Fig. {{ref>Fig5}}.
</caption>
</figure>

We can get both $p$ and $q$ with just two of the genotype frequencies because the three genotype frequencies must total to a frequency of 1:

<figure Fig7>
$$f(M/M) + f(M/N) + f(N/N) = 1$$
<caption>
Three possible genotypes with two alleles in a population $M$ and $N$, with their genotype frequencies adding up to one. This figure is essentially the same as Fig. {{ref>Fig3}} except that we are using the codominant alleles $M$ and $N$ in our example here instead of the generic $A$ and $a$ in Fig. {{ref>Fig3}}. 
</caption>
</figure>

Now let's think about how the inverse calculation would be performed. That is, how would we derive the genotype frequencies for the "F1" generation based on the allele frequencies of the "P" generation? To do this we must make an assumption about the frequency of mating of parents with different genotypes. If we assume that the gametes mix at random, we can calculate the compound probabilities of obtaining each possible combination of alleles in the offspring:

<table Tab1>
<columns 100% *100%*>

^   ^  egg  ^^
^  sperm  ^  $M$ ($p$)  ^  $N$ ($q$)  ^
^  $M$ ($p$)  |  $M/M$ ($p^2$)  |  $M/N$ ($p \cdot q$)  |
^  $N$ ($q$)  |  $N/M$ ($p \cdot q$ )  |  $N/N$ ($q^2$)  |
</columns>
<caption>
Calculating genotype frequencies of offspring based on allele frequencies. Does this table format look familiar to you?
</caption>
</table>

Thus the genotype frequencies for the next generation are:

<figure Fig8>
<div centeralign>
$f(M/M) = p^2$, $f(M/N) = 2pq$, and $f(N/N) = q^2$
</div>
<caption>
Calculating genotype frequencies based on allele frequencies, continued from Table {{ref>Tab1}}. Note that since $p+q=1$ (Fig. {{ref>Fig2}}), it follows that $(p+q)^2=1$ and therefore $p^2+2pq+q^2=1$, which is equivalent to saying that $f(M/M) + f(M/N) + f(N/N) = 1$. In other words, the genotype frequencies of all possible genotypes must add up to one. Does this math look familiar to you?
</caption>
</figure>

We can now calculate the new $p_1$ for the new generation using the formula for deriving allele frequencies from genotype frequencies seen in Fig. {{ref>Fig4}}:

<figure Fig9>
$$\begin{aligned} p_1 &= f(M/M) + 1/2 f(M/N)\\
&= p^2 + pq\\
&=p (p +q)\\
&=p \end{aligned}$$
<caption>
If random mating occurs, then the frequency of alleles does not change from one generation ($p$) to another ($p_1$).
</caption>
</figure>


From Fig. {{ref>Fig9}} we obtain the simple but very important result that when mixing of gametes occurs at random, the allele frequencies do not change from one generation to the next (i.e., $p_1=p$). We say any gene that has this property as being in genetic equilibrium. When applied to all genes and alleles in a population, this condition is known as a Hardy-Weinberg equilibrium, named after British mathematician [[wp>G._H._Hardy|G.H. Hardy]] and German physician [[wp>Wilhelm_Weinberg|Wilhelm Weinberg]]. Sometimes geneticists use "Hardy-Weinberg equilibrium" as a synonym for genetic equilibrium. 

===== Examples of populations in (and not in) Hardy-Weinberg equilibria =====

If we know the genotype frequencies and allele frequencies of a gene in a population, then we can ask whether the gene in question in that population is in a Hardy-Weinberg equilibrium for that gene by determining whether the genotype frequencies reflect random mixing of alleles. Consider two different populations that have different genotype frequencies and different allele frequencies (Table {{ref>Tab2}}):

<table Tab2>
<columns 100% *100%*>
^    ^  $M/M$  ^  $M/N$  ^  $N/N$  ^  $p$  ^  $q$  ^
^  U.S. Caucasians  |  0.29  |  0.5  |  0.21  |  0.54  |  0.46  |
^  American Inuit  |  0.84  |  0.16  |  0.008  |  0.92  |  0.08  |
</columns>
<caption>
Hypothetical dataset showing observed genotype frequencies of codominant blood antigens $M$ and $N$, and calculated allele frequencies $p$ and $q$ in U.S. Caucasians vs American Inuit. 
</caption>
</table> 

Although the allele frequencies are quite different, both populations have the genotype frequencies and allele frequencies that fit the Hardy-Weinberg equilibrium. For instance, in the U.S. Caucasian population, the frequencies of $M/M$, $M/N$, and $N/N$ are obtained by surveying the population (this is your primary data). From these frequencies, we can calculate $p$ and $q$ using the formula in Fig. {{ref>Fig4}}. 

Now let's consider two sample populations that have the same allele frequencies but have different genotype frequencies.

<table Tab3>
<columns 100% *100%*>
^    ^  $M/M$  ^  $M/N$  ^  $N/N$  ^  $p$  ^  $q$  ^
^  population 1  |  0.2  |  0.2  |  0.6  |  0.3  |  0.7  |
^  population 2  |  0.09  |  0.42  |  0.49  |  0.3  |  0.7  |
</columns>
<caption>
Dataset showing observed genotype frequencies of codominant blood antigens $M$ and $N$, and calculated allele frequencies $p$ and $q$ in two different hypothetical populations. Compare to Table {{ref>Tab2}}.
</caption>
</table>

Based on the observed genotype frequencies of population 1, we can calculate $p$ and $q$, and we can further derive $p^2=0.09$, $2pq=0.42$, and $q^2=0.49$. This clearly does not match the observed genotype frequencies of $f(M/M)=0.2$, $f(M/N)=0.2$, and $f(N/N)=0.6$. $p^2$ is greater than the observed genotype frequency $f(M/M)$ — suggesting something is happening to cause the $M$ allele to be underrepresented after a generation of breeding. Population 1 is not in Hardy-Weinberg equilibrium. 

We can do the same analysis for population 2. Even though the observed genotype frequencies are different, the allele frequencies $p$ and $q$ wind up being the same as population 1. And when we now use the allele frequencies to derive $p^2$, $2pq$, and $q^2$, we find they match the observed genotype frequencies. Therefore, population 2 is in Hardy-Weinberg equilibrium. 

Here is a graph showing the relationship between allele and genotype frequencies for genes that are in Hardy-Weinberg equilibrium:

<figure Fig10>
{{ :hw_equilibrium.png?400 |}}
<caption>
Relationship between allele frequency $p$ (and $q$) and genotype frequency for genes in Hardy-Weinberg equilibrium. 
</caption>
</figure>

Before (Fig. {{ref>Fig4}}) we needed at least two of the genotype frequencies to calculate allele frequency but if we know that the population is in a Hardy-Weinberg equilibrium we can get both allele frequencies and all genotype frequencies from just one of the genotype frequencies or one of the allele frequencies.

===== Hardy-Weinberg vs. the real (human) world =====

Our discussion above of the Hardy-Weinberg equilibrium assumes that there is random mating in human populations. That is to say, in order for a population to be in a Hardy-Weinberg equilibrium, there must be random mating within the population. How good is the random mating assumption in actual human populations? These are some of the conditions that affect random mating assumption and therefore may affect whether a population can be in Hardy-Weinberg equilibrium:

==== Genotypic effects on choice of mating partner ====

Examination of allele frequencies and genotype frequencies for most genes in the human populations reveals that they closely fit Hardy-Weinberg equilibrium. The implication is that in general, humans choose their mates at random with respect to individual genes and alleles. This may seem odd given that personal experience says that choosing a mate is anything but random. However the usual criteria for choosing mates such as character, appearance, and social position are largely not determined genetically and, to the extent that they are genetically determined, these are all very complex traits that are influenced by a large number of different genes. The net result is that our decision of with whom we have children does not in general systematically favor some alleles over others. 

One of the exceptional conditions that produce a population that is not in Hardy-Weinberg equilibrium is known as assortative mating, which means preferential mating between similar individuals. For example, individuals with inherited deafness have a relatively high probability of having children together. But even this type of assortative mating will only affect the genotype frequencies related to deafness.


==== New mutations ====

Although new mutations continually arise, mutation rates are usually sufficiently small that in any single generation their effect on allele frequencies is negligible. As will be discussed in the next chapter, the effect of mutations compounded over many generations can have a significant effect on allele frequencies.

==== Selection ====

"%%Selection%%" has a slightly different but still related meaning than "selection" when discussing a genetic trick for isolating rare events in [[chapter_08#screen_selection|Chapter 08]]. "%%Selection%%" in this context refers to differences in survival or reproduction of different genotypes. Like new mutations, the effect of selection is usually small in any single generation and therefore usually does not affect Hardy-Weinberg equilibria. An exception would be a recessive lethal mutation that would render the genotype frequency of the homozygote to be zero regardless of the genotype frequency of the heterozygote. As will be discussed in the next chapter, the effect of selection can have a significant effect over many generations.

==== Genetic drift/founder effect ====

In small populations, only a small number of individuals pass their alleles on to the next generation. Under these circumstances, chance fluctuations in the alleles that are transmitted can cause significant changes in allele frequency. These effects are usually insignificant for large populations such as in the U.S.

To see how this would happen, consider a gene in a very large population with a single major dominant allele $A$ and 10 minor recessive alleles $a_1$, $a_2$, $a_3$ ... $a_{10}$ with allele frequencies $ƒ(a_1) = ƒ(a_2) = ƒ(a_3) ... = 10^{-4}$ and $f(A)=0.999$. Now imagine that a group of 500 individuals from this population move to an island to start a new population. 

The aggregate frequency of recessive alleles in this population is $a_n = 10\times10^{-4}=10^{-3}$. We can calculate the probability that $n$ number of individuals are carriers (either $a_n/a_n$ or $A/a_n$) as:

<figure Fig11>
$$p(n)=(p^2+2pq)^n \cdot (q^2)^{500-n} \cdot \frac{500!}{n!(500-n)!}$$
<caption>
Probability that $n$ number of individuals among the 500 moving to the island are carriers of one of the recessive alleles $a_n$. Based on the information given in the text above, we know that $p=f(a_n)=0.001$ and $q=0.999$.
</caption>
</figure>

Plugging in different integer values for $n$ and values for $p$ and $q$ as described in the legend for Fig. {{ref>Fig11}}, we get the following table:

<table Tab4>
<columns 100% *100%*>
^  number of carriers $n$  ^  probability that $n$ carriers are \\ part of the migrating population  ^
|  1  |  0.368  |
|  2  |  0.184  |
|  3  |  0.061  |
|  4  |  0.015  |
|  5  |  0.003  |
|  6  |  <0.001  |
</columns>
<caption>
Probability table for likelihood of there being carriers of rare recessive alleles in migrating population in our example. 
</caption>
</table>

The way to interpret Table {{ref>Tab4}} is to say, "The probability that there is exactly one carrier in the population is 0.368", "The probability that there are exactly two carriers in the population is 0.184", etc. The likelihood that there will be at least one carrier is actually pretty good; that is given as $1-(q^2)^{500} = 0.632$. Since there are 10 recessive alleles, the minimum number of carriers you would need to have all 10 alleles move to the new population would be $n=5$, since humans are diploid. But the likelihood of there being 5 carriers in the population is 0.003, or less than 1%. Because the alleles are rare, it's much more likely that only one or two carriers will be part of the migration. Also , it's highly unlikely that any of the carriers will be homozygous for any of the recessive alleles (or even carry two different recessive alleles). 

Let's say allele $a_1$ is brought over by a single individual with genotype $A/a_1$. This means that alleles $a_2$ through $a_{10}$ are now lost in the new population. The new allele frequency for $a_1$ is now 1 in 1000, or $f(a_1)=10^{-3}$; an increase of 10-fold! Thus, in a stochastic fashion most of the rare recessive alleles will be lost, whereas an occasional rare allele will experience an increase in frequency. The smaller the founding population the more likely that a rare allele will be lost and the greater the increase in frequency experienced by the alleles that happen to be selected.

==== Migration of individuals between different populations ====

When individuals from populations with different allele frequencies mix, the combined population will be in Hardy-Weinberg equilibrium after one generation of random mating. The combined population will be out of equilibrium to the extent that mating is assortatative.

===== An example of calculating allele frequency in humans: albinism =====

If we are considering rare alleles we can make the following approximations allowing us to avoid a lot of messy algebra in our calculations.

<div centeralign>
For $f(A)=p$ and $f(a)=q$,\\
If $q<<1$, then $p \approx 1$
</div>

Therefore, $f(A/A)=p^2 \approx 1$, $f(A/a)=2pq \approx 2q$, and $f(a/a)=q^2$. Since most genetic diseases are rare, these approximations are valid for many of the population genetics calculations that are of medical importance. Note that by convention, the $q$ symbol is used to represent rare alleles.  

Let's look at a real-life example. Albinism occurs in approximately 1 in 20,000 individuals in humans. Let's say that this condition is due to a recessive allele $a$ of a single gene that is in Hardy-Weinberg equilibrium. Based on this information, we can derive the allele frequency for $a$:

$$f(a/a) = \frac{1}{20000} = 5\times10^{-5}=q^2\\
q=\sqrt{5\times10^{-5}}=7\times10^{-3}$$

And based on this we can also calculate the frequency of heterozygotes in the population:

$$f(A/a)=2pq\approx 2q = 1.4\times10^{-2}$$

In other words, approximately 1 in 140 humans are carriers for albinism. We can next calculate the fraction of alleles for albinism that are in individuals that are actually albinos (i.e., their genotype is $a/a$). If we let $\text{N}$ = population size, then the number of alleles in homozygotes will be $2\times\text{N}\cdot q^2$. The number of alleles in heterozygotes (carriers) will be $1\times\text{N} \cdot 2pq \approx \text{N}(2q)$. Therefore, the fraction of $a$ alleles in homozygotes is:

$$ \frac{2\times\text{N}\cdot q^2}{2\times\text{N}\cdot q^2+\text{N}(2q)}\\
=\frac{q}{q+1}$$

Since $q<<1$, we can approximate $\frac{q}{q+1}$ with just $q$. In other words, the fraction of allele $a$ in homozygotes is 7x10<sup>-3</sup>. The vast majority of the alleles ($1-(7\times10^{-3})=99.3\%$) are in heterozygotes.

A basic ethics lesson we can get from this simple example is that eugenics is an exercise in futility. Recessive alleles can easily exist at relatively high frequencies inside human populations.