Differences

This shows you the differences between two versions of the page.

--- chapter_18 [2024/09/08 16:05] – [Genetic drift/founder effect] mike
+++ chapter_18 [2024/12/24 04:51] (current) – [An example of calculating allele frequency in humans: albinism] mike
@@ Line 1: / Line 1: @@
+<- chapter_17|Chapter 17^table_of_contents|Table of Contents^chapter_19|Chapter 19 ->
 <typo fs:x-large> Chapter 18. The Hardy-Weinberg equilibrium</typo>
-Until now, we have been carrying out genetic analysis of individuals, This approach works well when you are studying a model organism, such as Drosophila or mice. It doesn't work well when you are interested in the genetics of organisms that cannot be studied in this way. For the next several chapters, we will consider genetics from the point of view of groups of individuals, or populations. We will treat this subject entirely from the perspective of human population studies where population genetics is used to get the type of information that would ordinarily be obtained by breeding experiments in model organisms.
+In earlier chapters, we have been carrying out genetic analysis using controlled crosses where the genotypes are known and we can breed organisms however we want. This approach works well when you are studying a model organism, such as Drosophila or mice. It doesn't work well when you are interested in the genetics of organisms that cannot be studied in this way. For the next several chapters, we will consider genetics from the point of view of populations. We will treat this subject entirely from the perspective of human population studies where population genetics is used to get the type of information that would ordinarily be obtained by breeding experiments in model organisms.
+In the next several chapters, we will use a substantial amount of math. To avoid confusion with fractions, we will write genotypes as $A/a$ instead of $\frac{A}{a}$.
 ===== The Hardy Weinberg equilibrium =====
-At the heart of population genetics is the concept of allele frequency. Consider a human gene with two alleles:$A$ and $a$. The frequency of $A$ is $f(A)$ ; the frequency of a is $f(a)$. We define the following symbols:
+At the heart of population genetics is the concept of allele frequency. Consider a human gene with two alleles:$A$ and $a$. The frequency of $A$ is $f(A)$ ; the frequency of $a$ is $f(a)$. We define the following symbols:
 <figure Fig1>
@@ Line 29: / Line 33: @@
 <figure Fig3>
-$$f(\frac{A}{A}) + f(\frac{A}{a}) + f(\frac{a}{a}) = 1$$
+$$f(A/A) + f(A/a) + f(a/a) = 1$$
 <caption>
-Three possible genotypes with two alleles, and how we represent their genotype frequencies. All possible genotype frequencies must add up to one. Note that the "fractions" here inside the $f()$ are not math symbols but genotypes.
+Three possible genotypes with two alleles, and how we represent their genotype frequencies. All possible genotype frequencies must add up to one.
 </caption>
 </figure>
@@ Line 39: / Line 43: @@
 <figure Fig4>
-$$p = f(\frac{A}{A})+\frac{1}{2} f(\frac{A}{a})\\
+$$p = f(A/A)+\frac{1}{2} f(A/a)\\
-q = f(\frac{a}{a})+\frac{1}{2} f(\frac{A}{a})$$
+q = f(a/a)+\frac{1}{2} f(A/a)$$
 <caption>
 Derivation of allele frequencies based on genotype frequencies.
@@ Line 46: / Line 50: @@
 </figure>
-Here is an example: $M$ and $N$ are alleles of a gene that specifies different blood antigens. The alleles are codominant, so a simple blood test can distinguish the three possible genotypes of $\frac{M}{M}$, $\frac{M}{N}$, and $\frac{N}{N}$. A survey of the population reveals that 83% of the population has only the M antigen, 1% of the population only has the N antigen, and 16% of the population has both the M and N antigens. In other words:
+Here is an example: $M$ and $N$ are alleles of a gene that specifies different blood antigens. The alleles are codominant, so a simple blood test can distinguish the three possible genotypes of $M/M$, $M/N$, and $N/N$. A survey of the population reveals that 83% of the population has only the $M$ antigen, 1% of the population only has the $N$ antigen, and 16% of the population has both the $M$ and $N$ antigens. In other words:
 <figure Fig5>
-$$f(\frac{M}{M}) = 0.83, f(\frac{M}{N}) = 0.16, f(\frac{N}{N}) = 0.01$$
+$$f(M/M) = 0.83, f(M/N) = 0.16, f(N/N) = 0.01$$
 <caption>
 Observed phenotype frequencies (and therefore genotype frequencies) of hypothetical blood antigens $M$ and $N$.
@@ Line 68: / Line 72: @@
 <figure Fig7>
-$$f(\frac{M}{M}) + f(\frac{M}{N}) + f(\frac{N}{N}) = 1$$
+$$f(M/M) + f(M/N) + f(N/N) = 1$$
 <caption>
-Three possible genotypes with two alleles in a population $M$ and $N$, with their genotype frequencies adding up to one. Note that as before, the "fractions" here inside the $f()$ are not math symbols but genotypes. This figure is essentially the same as Fig. {{ref>Fig3}} except that we are using the codominant alleles $M$ and $N$ in our example here instead of the generic $A$ and $a$ in Fig. {{ref>Fig3}}.
+Three possible genotypes with two alleles in a population $M$ and $N$, with their genotype frequencies adding up to one. This figure is essentially the same as Fig. {{ref>Fig3}} except that we are using the codominant alleles $M$ and $N$ in our example here instead of the generic $A$ and $a$ in Fig. {{ref>Fig3}}.
 </caption>
 </figure>
-Now let's think about how the inverse calculation would be performed. That is, how would we derive the genotype frequencies for the "F1" generation from the allele frequencies of the "P" generation((Note that "P" and "F1" are symbols used in Mendelian genetics for model organisms and don't really apply to population genetics. We're using them here just as an analogy.))? To do this we must make an assumption about the frequency of mating of parents with different genotypes. If we assume that the gametes mix at random, we can calculate the compound probabilities of obtaining each possible combination of alleles in the offspring:
+Now let's think about how the inverse calculation would be performed. That is, how would we derive the genotype frequencies for the "F1" generation based on the allele frequencies of the "P" generation? To do this we must make an assumption about the frequency of mating of parents with different genotypes. If we assume that the gametes mix at random, we can calculate the compound probabilities of obtaining each possible combination of alleles in the offspring:
 <table Tab1>
@@ Line 81: / Line 85: @@
 ^   ^  egg  ^^
 ^  sperm  ^  $M$ ($p$)  ^  $N$ ($q$)  ^
-^  $M$ ($p$)  |  $\frac{M}{M}$ ($p^2$  |  $\frac{M}{N}$ ($p \cdot q$)  |
+^  $M$ ($p$)  |  $M/M$ ($p^2$)  |  $M/N$ ($p \cdot q$)  |
-^  $N$ ($q$)  |  $\frac{M}{N}$ ($p \cdot q$ )  |  $\frac{M}{M}$ ($q^2$)  |
+^  $N$ ($q$)  |  $N/M$ ($p \cdot q$ )  |  $N/N$ ($q^2$)  |
 </columns>
 <caption>
@@ Line 93: / Line 97: @@
 <figure Fig8>
 <div centeralign>
-$f(\frac{M}{M}) = p^2$, $f(\frac{M}{N}) = 2pq$, and $f(\frac{N}{N}) = q^2$
+$f(M/M) = p^2$, $f(M/N) = 2pq$, and $f(N/N) = q^2$
 </div>
 <caption>
-Calculating genotype frequencies based on allele frequencies, continued from Table {{ref>Tab1}}. Note that since $p+q=1$ (Fig. {{ref>Fig2}}), it follows that $(p+q)^2=1$ and therefore $p^2+2pq+q^2=1$, which is equivalent to saying that $f(\frac{M}{M}) + f(\frac{M}{N}) + f(\frac{N}{N}) = 1$. In other words, the genotype frequencies of all possible genotypes must add up to one. Does this math look familiar to you?
+Calculating genotype frequencies based on allele frequencies, continued from Table {{ref>Tab1}}. Note that since $p+q=1$ (Fig. {{ref>Fig2}}), it follows that $(p+q)^2=1$ and therefore $p^2+2pq+q^2=1$, which is equivalent to saying that $f(M/M) + f(M/N) + f(N/N) = 1$. In other words, the genotype frequencies of all possible genotypes must add up to one. Does this math look familiar to you?
 </caption>
 </figure>
@@ Line 103: / Line 107: @@
 <figure Fig9>
-$$\begin{aligned} p_1 &= f(\frac{M}{M}) + 1/2 f(\frac{M}{N})\\
+$$\begin{aligned} p_1 &= f(M/M) + 1/2 f(M/N)\\
 &= p^2 + pq\\
 &=p (p +q)\\
@@ Line 121: / Line 125: @@
 <table Tab2>
 <columns 100% *100%*>
-^    ^  $\frac{M}{M}$  ^  $\frac{M}{N}$  ^  $\frac{N}{N}$  ^  $p$  ^  $q$  ^
+^    ^  $M/M$  ^  $M/N$  ^  $N/N$  ^  $p$  ^  $q$  ^
 ^  U.S. Caucasians  |  0.29  |  0.5  |  0.21  |  0.54  |  0.46  |
 ^  American Inuit  |  0.84  |  0.16  |  0.008  |  0.92  |  0.08  |
@@ Line 130: / Line 134: @@
 </table>
-Although the allele frequencies are quite different, both populations have the genotype frequencies and allele frequencies that fit the Hardy-Weinberg equilibrium. For instance, in the U.S. Caucasian population, the frequencies of $\frac{M}{M}$, $\frac{M}{N}$, and $\frac{N}{N}$ are obtained by surveying the population (this is your primary data). From these frequencies, we can calculate $p$ and $q$ using the formula in Fig. {{ref>Fig4}}.
+Although the allele frequencies are quite different, both populations have the genotype frequencies and allele frequencies that fit the Hardy-Weinberg equilibrium. For instance, in the U.S. Caucasian population, the frequencies of $M/M$, $M/N$, and $N/N$ are obtained by surveying the population (this is your primary data). From these frequencies, we can calculate $p$ and $q$ using the formula in Fig. {{ref>Fig4}}.
 Now let's consider two sample populations that have the same allele frequencies but have different genotype frequencies.
@@ Line 136: / Line 140: @@
 <table Tab3>
 <columns 100% *100%*>
-^    ^  $\frac{M}{M}$  ^  $\frac{M}{N}$  ^  $\frac{N}{N}$  ^  $p$  ^  $q$  ^
+^    ^  $M/M$  ^  $M/N$  ^  $N/N$  ^  $p$  ^  $q$  ^
 ^  population 1  |  0.2  |  0.2  |  0.6  |  0.3  |  0.7  |
 ^  population 2  |  0.09  |  0.42  |  0.49  |  0.3  |  0.7  |
@@ Line 145: / Line 149: @@
 </table>
-Based on the observed genotype frequencies of population 1, we can calculate $p$ and $q$, and we can further derive $p^2=0.09$, $2pq=0.42$, and $q^2=0.49$. This clearly does not match the observed genotype frequencies of $\frac{M}{M}=0.2$, $\frac{M}{N}=0.2$, and $\frac{N}{N}=0.6$, which is what we predict the $\frac{M}{M}$ genotype frequency would be based on allele frequency $p$. $p^2$ is greater than the observed genotype frequency $f(\frac{M}{M})$ — suggesting something is happening to cause the $M$ allele to be underrepresented after a generation of breeding. Population 1 is not in a Hardy-Weinberg equilibrium.
+Based on the observed genotype frequencies of population 1, we can calculate $p$ and $q$, and we can further derive $p^2=0.09$, $2pq=0.42$, and $q^2=0.49$. This clearly does not match the observed genotype frequencies of $f(M/M)=0.2$, $f(M/N)=0.2$, and $f(N/N)=0.6$. $p^2$ is greater than the observed genotype frequency $f(M/M)$ — suggesting something is happening to cause the $M$ allele to be underrepresented after a generation of breeding. Population 1 is not in Hardy-Weinberg equilibrium.
-We can do the same things for population 2. Even though the observed genotype frequencies are different, the allele frequencies $p$ and $q$ wind up being the same as population 1. And when we now use the allele frequencies to derive $p^2$, $2pq$, and $q^2$, we find they match the observed genotype frequencies. Therefore, population 2 is in a Hardy-Weinberg equilibrium.
+We can do the same analysis for population 2. Even though the observed genotype frequencies are different, the allele frequencies $p$ and $q$ wind up being the same as population 1. And when we now use the allele frequencies to derive $p^2$, $2pq$, and $q^2$, we find they match the observed genotype frequencies. Therefore, population 2 is in Hardy-Weinberg equilibrium.
-Here is a graph showing the relationship between allele and genotype frequencies for genes that are in a Hardy-Weinberg equilibrium:
+Here is a graph showing the relationship between allele and genotype frequencies for genes that are in Hardy-Weinberg equilibrium:
 <figure Fig10>
 {{ :hw_equilibrium.png?400 |}}
 <caption>
-Relationship between allele frequency $p$ (and $q$) and genotype frequency for genes in a Hardy-Weinberg equilibrium.
+Relationship between allele frequency $p$ (and $q$) and genotype frequency for genes in Hardy-Weinberg equilibrium.
 </caption>
 </figure>
@@ Line 162: / Line 166: @@
 ===== Hardy-Weinberg vs. the real (human) world =====
-Our discussion above of the Hardy-Weinberg equilibrium assumes that there is random mating in human populations. That is to say, in order for a population to be in a Hardy-Weinberg equilibrium, there must be random mating within the population. How good is the random mating assumption in actual human populations? These are some of the conditions that affect random mating assumption and therefore may affect H-W equilibrium:
+Our discussion above of the Hardy-Weinberg equilibrium assumes that there is random mating in human populations. That is to say, in order for a population to be in a Hardy-Weinberg equilibrium, there must be random mating within the population. How good is the random mating assumption in actual human populations? These are some of the conditions that affect random mating assumption and therefore may affect whether a population can be in Hardy-Weinberg equilibrium:
 ==== Genotypic effects on choice of mating partner ====
-Examination of allele frequencies and genotype frequencies for most genes in the human populations reveals that they closely fit a Hardy-Weinberg equilibrium. The implication is that in general, humans choose their mates at random with respect to individual genes and alleles. This may seem odd given that personal experience says that choosing a mate is anything but random. However the usual criteria for choosing mates such as character, appearance, and social position are largely not determined genetically and, to the extent that they are genetically determined, these are all very complex traits that are influenced by a large number of different genes. The net result is that our decision of with whom we have children does not in general systematically favor some alleles over others.
+Examination of allele frequencies and genotype frequencies for most genes in the human populations reveals that they closely fit Hardy-Weinberg equilibrium. The implication is that in general, humans choose their mates at random with respect to individual genes and alleles. This may seem odd given that personal experience says that choosing a mate is anything but random. However the usual criteria for choosing mates such as character, appearance, and social position are largely not determined genetically and, to the extent that they are genetically determined, these are all very complex traits that are influenced by a large number of different genes. The net result is that our decision of with whom we have children does not in general systematically favor some alleles over others.
-One of the exceptional conditions that produce a population that is not in a Hardy-Weinberg equilibrium is known as assortative mating, which means preferential mating between similar individuals. For example, individuals with inherited deafness have a relatively high probability of having children together. But even this type of assortative mating will only affect the genotype frequencies related to deafness.
+One of the exceptional conditions that produce a population that is not in Hardy-Weinberg equilibrium is known as assortative mating, which means preferential mating between similar individuals. For example, individuals with inherited deafness have a relatively high probability of having children together. But even this type of assortative mating will only affect the genotype frequencies related to deafness.
@@ Line 185: / Line 189: @@
 To see how this would happen, consider a gene in a very large population with a single major dominant allele $A$ and 10 minor recessive alleles $a_1$, $a_2$, $a_3$ ... $a_{10}$ with allele frequencies $ƒ(a_1) = ƒ(a_2) = ƒ(a_3) ... = 10^{-4}$ and $f(A)=0.999$. Now imagine that a group of 500 individuals from this population move to an island to start a new population.
-The aggregate frequency of recessive alleles in this population is $a_n = 10\times10^{-4}=10^{-3}$. We can calculate the probability that $n$ number of individuals are carriers (either $\frac{a_n}{a_n}$ or $\frac{A}{a_n}$) as:
+The aggregate frequency of recessive alleles in this population is $a_n = 10\times10^{-4}=10^{-3}$. We can calculate the probability that $n$ number of individuals are carriers (either $a_n/a_n$ or $A/a_n$) as:
 <figure Fig11>
@@ Line 213: / Line 217: @@
 The way to interpret Table {{ref>Tab4}} is to say, "The probability that there is exactly one carrier in the population is 0.368", "The probability that there are exactly two carriers in the population is 0.184", etc. The likelihood that there will be at least one carrier is actually pretty good; that is given as $1-(q^2)^{500} = 0.632$. Since there are 10 recessive alleles, the minimum number of carriers you would need to have all 10 alleles move to the new population would be $n=5$, since humans are diploid. But the likelihood of there being 5 carriers in the population is 0.003, or less than 1%. Because the alleles are rare, it's much more likely that only one or two carriers will be part of the migration. Also , it's highly unlikely that any of the carriers will be homozygous for any of the recessive alleles (or even carry two different recessive alleles).
-Let's say allele $a_1$ is brought over by a single individual with genotype $\frac{A}{a_1}$. This means that alleles $a_2$ through $a_{10}$ are now lost in the new population. The new allele frequency for $a_1$ is now 1 in 1000, or $f(a_1)=10^{-3}$; an increase of 10-fold! Thus, in a stochastic fashion most of the rare recessive alleles will be lost, whereas an occasional rare allele will experience an increase in frequency. The smaller the founding population the more likely that a rare allele will be lost and the greater the increase in frequency experienced by the alleles that happen to be selected.
+Let's say allele $a_1$ is brought over by a single individual with genotype $A/a_1$. This means that alleles $a_2$ through $a_{10}$ are now lost in the new population. The new allele frequency for $a_1$ is now 1 in 1000, or $f(a_1)=10^{-3}$; an increase of 10-fold! Thus, in a stochastic fashion most of the rare recessive alleles will be lost, whereas an occasional rare allele will experience an increase in frequency. The smaller the founding population the more likely that a rare allele will be lost and the greater the increase in frequency experienced by the alleles that happen to be selected.
 ==== Migration of individuals between different populations ====
-When individuals from populations with different allele frequencies mix, the combined population will be in a Hardy-Weinberg equilibrium after one generation of random mating. The combined population will be out of equilibrium to the extent that mating is assortatative.
+When individuals from populations with different allele frequencies mix, the combined population will be in Hardy-Weinberg equilibrium after one generation of random mating. The combined population will be out of equilibrium to the extent that mating is assortatative.
 ===== An example of calculating allele frequency in humans: albinism =====
@@ Line 228: / Line 232: @@
 </div>
-Therefore, $f(\frac{A}{A})=p^2 \approx 1$, $f(\frac{A}{a})=2pq \approx 2q$, and $f(\frac{a}{a})=q^2$. Since most genetic diseases are rare, these approximations are valid for many of the population genetics calculations that are of medical importance.
+Therefore, $f(A/A)=p^2 \approx 1$, $f(A/a)=2pq \approx 2q$, and $f(a/a)=q^2$. Since most genetic diseases are rare, these approximations are valid for many of the population genetics calculations that are of medical importance. Note that by convention, the $q$ symbol is used to represent rare alleles.
+Let's look at a real-life example. Albinism occurs in approximately 1 in 20,000 individuals in humans. Let's say that this condition is due to a recessive allele $a$ of a single gene that is in Hardy-Weinberg equilibrium. Based on this information, we can derive the allele frequency for $a$:
+$$f(a/a) = \frac{1}{20000} = 5\times10^{-5}=q^2\\
+q=\sqrt{5\times10^{-5}}=7\times10^{-3}$$
+And based on this we can also calculate the frequency of heterozygotes in the population:
+$$f(A/a)=2pq\approx 2q = 1.4\times10^{-2}$$
+In other words, approximately 1 in 140 humans are carriers for albinism. We can next calculate the fraction of alleles for albinism that are in individuals that are actually albinos (i.e., their genotype is $a/a$). If we let $\text{N}$ = population size, then the number of alleles in homozygotes will be $2\times\text{N}\cdot q^2$. The number of alleles in heterozygotes (carriers) will be $1\times\text{N} \cdot 2pq \approx \text{N}(2q)$. Therefore, the fraction of $a$ alleles in homozygotes is:
+$$ \frac{2\times\text{N}\cdot q^2}{2\times\text{N}\cdot q^2+\text{N}(2q)}\\
+=\frac{q}{q+1}$$
+Since $q<<1$, we can approximate $\frac{q}{q+1}$ with just $q$. In other words, the fraction of allele $a$ in homozygotes is 7x10<sup>-3</sup>. The vast majority of the alleles ($1-(7\times10^{-3})=99.3\%$) are in heterozygotes.
+A basic ethics lesson we can get from this simple example is that eugenics is an exercise in futility. Recessive alleles can easily exist at relatively high frequencies inside human populations.
-Let's look at a real-life example. Albinism occurs in approximately 1 in 20,000 individuals in humans. Let's say that this condition is due to a recessive allele $a$ of a single gene that is in a Hardy-Weinberg equilibrium. Based on this information, we can derive the allele frequency for $a$: