Differences

This shows you the differences between two versions of the page.

--- chapter_22 [2024/11/25 17:36] – [LOD score formula for unknown phase] mike
+++ chapter_22 [2024/11/25 18:34] (current) – [LOD score formula for unknown phase] mike
@@ Line 210: / Line 210: @@
 </table>
-We can see that the LOD score is highest around θ = 0.15, at which LOD = 2.35 (you can calculate a more precise maximal value of LOD using some calculus). Experience tells us that LOD > 3.3 is usually the cutoff point at which the linkage is likely to be real (it corresponds to a 0.05 false positive rate); conversely, LOD < -2 usually means we can rule out linkage. Therefore, we can see that the data do not support linkage between $m$ and $s$, but does not rule it out either.
+<figure Fig3>
+{{ :phase_known_graph.png?400 |}}
+<caption>
+placeholder. graph plotting LOD as a function of theta.
+</caption>
+</figure>
+We can see from Table {{ref>Tab5}} and Figure {{ref>Fig3}} that the LOD score is highest around θ = 0.15, at which LOD = 2.35 (you can calculate a more precise maximal value of LOD using some calculus). Experience tells us that LOD > 3.3 is usually the cutoff point at which the linkage is likely to be real (it corresponds to a 0.05 false positive rate); conversely, LOD < -2 usually means we can rule out linkage. Therefore, we can see that the data do not support linkage between $m$ and $s$, but does not rule it out either.
 What if our sample size were bigger? Let's say that instead of 20 offspring, we looked at 40 offspring and obtained R=6 and NR=34. At θ=0.15, we can calculate that LOD=5.9. This puts us over the threshold of 3.3. Based on this data, we can conclude that $m$ and $s$ are linked, and the strongest support of the data suggest they are 15 map units apart.
@@ Line 216: / Line 223: @@
 ==== LOD score formula for unknown phase ====
-Now let's return to a human example:
+In mouse experiments, the phase of all offspring are usually known. Now, let's return to a human example where it is more common that the phase is unknown:
-<figure Fig3>
+<figure Fig4>
 {{ :pedigree_ex2.jpg?400 |}}
 <caption>
 placeholder
 </caption>
+</figure>
-In this example, a family with 5 children is affected by an autosomal dominant mutation $D$. We are testing linkage with marker $m$, of which there are two alleles, $M$ and $m$. $m$ is known to be on the same chromosome as $D$ but it is unknown if they are linked. The genotype of the mother (individual 2) is known to be $\frac{a \; d}{a \; d}$. The genotype of the father (individual 1) is known but the phase is not known; his genotype is either $\frac{A \; D}{a \; d}$ or $\frac{A \; d}{a \; D}$. Based on this information, we can categorize the genotypes and chromosomes of the children:
+In this example, a family with 5 children is affected by an autosomal dominant mutation $D$. We are testing linkage with marker $m$, of which there are two alleles, $M$ and $m$. $m$ is known to be on the same chromosome as $D$ but it is unknown if they are linked. The genotype of the mother (individual 2) is known to be $\frac{a \; d}{a \; d}$. The genotype of the father (individual 1) is known but the phase is not known and therefore there are two possible phases; his genotype is either $\frac{A \; D}{a \; d}$ (phase 1) or $\frac{A \; d}{a \; D}$ (phase 2). Based on this information, we can categorize the genotypes and chromosomes of the children:
 <table Tab6>
@@ Line 230: / Line 238: @@
 ^  individual  ^  3  ^  4  ^  5  ^  6  ^  7  ^
-^  inferred genotype without phase information  |  MmDd  |  mmDd  |  MmDd  |  mmdd  |  mmdd  |
+^  inferred %%genotype%%  |  $\frac{A \; D}{a \; d}$  |  $\frac{a \; D}{a \; d}$  |  $\frac{A \; D}{a \; d}$  |  $\frac{a \; d}{a \; d}$  |  $\frac{a \; d}{a \; d}$  |
-^  paternal chromosome
+^  paternal %%chromosome%%  |  $A \; D$  |  $a \; D$  |  $A \; D$  |  $a \; d$  |  $a \; d$  |
+^  if phase 1  |  NR  |  R  |  NR  |  NR  |  NR  |
+^  if phase 2  |  R  |  NR  |  R  |  R  |  R  |
+^  maternal %%chromosome%%  |  not informative  |||||
 </columns>
 <caption>
@@ Line 237: / Line 248: @@
 </caption>
 </table>
+We first define some terms:
+  * T = total number of informative chromosomes; T = R+NR
+  * R1 = number of recombinant chromosomes in phase 1
+  * NR1 = number of non-recombinant chromosomes in phase 1
+  * R2 = number of recombinant chromosomes in phase 2
+  * NR2 = number of non-recombinant chromosomes in phase 2
+  * θ = recombination fraction (i.e., map distance) between $m$ and $D$
+As before, we define the LOD score as:
+$$ \begin{aligned}\text{LOD score} &= \log{\frac{\text{probability of observed pedigree data given } 0<\theta<\frac{1}{2}}{\text{probability of observed pedigree data given } \theta=\frac{1}{2}}} \\
+&= \log{\frac{\text{probability}(\theta)}{\text{probability}(\frac{1}{2})}}\end{aligned}
+$$
+Since we don't know what the phase is, we must calculate $\text{probabilty}(\theta)$ for both phases:
+$$\text{probabilty}(\theta)_1=\theta^{R1} \cdot (1-\theta)^{NR1} \\
+\text{probabilty}(\theta)_2=\theta^{R2} \cdot (1-\theta)^{NR2}  $$
+And since phase 1 and phase 2 are equally likely, we take the average of both for the LOD score:
+$$ \begin{aligned} \text{LOD} &=\log{\frac{\frac{\theta^{R1} \cdot (1-\theta)^{NR1}+\theta^{R2} \cdot (1-\theta)^{NR2}}{2}}{(\frac{1}{2})^{R+NR}}} \\
+&=(T-1)\log{2}+\log{\theta^{R1} \cdot (1-\theta)^{NR1}+\theta^{R2} \cdot (1-\theta)^{NR2}}\end{aligned}$$
+In this example, there are a total of 5 informative chromosomes (R1=1, NR1=4; R2=4, NR2=1). Using the formula, we can calculate that at θ=0.25 we get a max LOD score of 0.25 - this does not cross the threshold of LOD=3.3 and therefore is not evidence of linkage. When the phase is unknown, LOD scores will be substantially lower than if the phase were known. If we knew the phase in our example to be phase 1, we could calculate at θ=0.25, LOD=0.403. That's still not significant but it's higher.
+One last important point on LOD scores is that they are additive. This is because probabilities (odds) are multiplicative, but since we're using the logarithm of odds, we can use addition. In our example above, we were able to calculate the LOD score for linkage between $D$ and $m$ for this family as LOD=0.25 at θ=0.25. If we obtained data from another family that resulted in LOD=0.24 at θ=0.25, we could then combine the data for the two families as LOD=0.25+0.29=0.54. This makes analyzing the data much easier and intuitive; the more data you have, the stronger case you have for linkage.