chapter_22
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
chapter_22 [2024/11/25 15:38] – [Concept 2: informativeness] mike | chapter_22 [2024/11/25 18:34] (current) – [LOD score formula for unknown phase] mike | ||
---|---|---|---|
Line 94: | Line 94: | ||
<columns 100% *100%*> | <columns 100% *100%*> | ||
- | ^ | + | ^ genotype |
^ expected | ^ expected | ||
^ observed | ^ observed | ||
</ | </ | ||
< | < | ||
- | placeholder. note here that expected is the same as null hypothesis. also note that SNPs are determined by PCR. note the smaller sample size compared to the drosophila exp. | + | placeholder. note here that expected is the same as null hypothesis. also note that SNPs are determined by PCR. note the smaller sample size compared to the drosophila exp. remind what the dot symbol means. |
</ | </ | ||
</ | </ | ||
Line 148: | Line 148: | ||
Now let us consider a different scenario to illustrate our second concept of informativeness. Let's say that we know the father' | Now let us consider a different scenario to illustrate our second concept of informativeness. Let's say that we know the father' | ||
- | Let's revisit the example in Table {{ref> | + | Let's revisit the example in Table {{ref> |
- | ^ | + | <table Tab4> |
+ | <columns 100% *100%*> | ||
+ | |||
+ | ^ | ||
^ expected | ^ expected | ||
- | ^ observed | + | ^ observed |
- | ^ | + | ^ |
+ | ^ " | ||
+ | </ | ||
+ | < | ||
+ | placeholder, | ||
+ | </ | ||
+ | </ | ||
+ | |||
+ | ==== LOD score formula for known phase ==== | ||
+ | |||
+ | We can now derive the LOD score formula for alleles with known phase. We first define the following terms: | ||
+ | |||
+ | * θ = recombinant fraction (equivalent to map distance) | ||
+ | * 1-θ = nonrecombinant fraction | ||
+ | * R = number of recombinant chromosomes | ||
+ | * NR = number of non-recombinant chromosomes | ||
+ | |||
+ | The logarithm of odds (LOD) score is given as: | ||
+ | |||
+ | $$ \begin{aligned}\text{LOD score} &= \log{\frac{\text{probability of observed pedigree data given } 0< | ||
+ | &= \log{\frac{\text{probability}(\theta)}{\text{Probability}(\frac{1}{2})}}\end{aligned} | ||
+ | $$ | ||
+ | |||
+ | The denominator in this formula reflects our null hypothesis: that two loci are unlinked. When two loci are unlinked, the probability that recombination occurs between them is $\frac{1}{2}$. By extension, the probability that recombination does not occur between them is $1-\frac{1}{2}=\frac{1}{2}$. In order for the data to be consistent with the null hypothesis, every chromosome for which the phase is known would have to occur with probability $\frac{1}{2}$. Therefore: | ||
+ | |||
+ | $$ \text{probability}(\frac{1}{2})=(\frac{1}{2})^R \cdot (\frac{1}{2})^{NR}=(\frac{1}{2})^{R+NR} $$ | ||
+ | |||
+ | The numerator in this formula reflects our experimental hypothesis: that the two loci are linked. When two loci are linked, the probability that recombination occurs between them is θ. By extension, the probability that recombination does not occur between them is (1-θ). In order for the data to be consistent with our experimental hypothesis, every recombinant chromosome for which the phase is known would have to occur with probability θ, and every non-recombinant chromosome for which the phase is known would have to occur with probability (1-θ). Therefore: | ||
+ | |||
+ | $$ \text{probability}(\theta)=\theta^R \cdot (1-\theta)^{NR} $$ | ||
+ | |||
+ | We can thus re-write the LOD score formula as: | ||
+ | |||
+ | $$ \begin{aligned} LOD &= \log{\frac{\theta^R \cdot (1-\theta)^{NR}}{(\frac{1}{2})^{R+NR}}} \\ | ||
+ | &= R\log{\theta} + NR\log{1-\theta} + (N+NR)\log{2} \end{aligned} $$ | ||
+ | |||
+ | Let's use the mouse data in Table {{ref> | ||
+ | |||
+ | <table Tab5> | ||
+ | <columns 100% *100%*> | ||
+ | |||
+ | ^ θ value ^ LOD score ^ | ||
+ | | 0 | undefined | ||
+ | | 0.05 | 1.74 | | ||
+ | | 0.1 | 2.24 | | ||
+ | | 0.15 | 2.35 | | ||
+ | | 0.3 | 1.82 | | ||
+ | | 0.5 | 0 (by definition) | ||
+ | </ | ||
+ | < | ||
+ | placeholder for LOD score table | ||
+ | </ | ||
+ | </ | ||
+ | |||
+ | <figure Fig3> | ||
+ | {{ : | ||
+ | < | ||
+ | placeholder. graph plotting LOD as a function of theta. | ||
+ | </ | ||
+ | </ | ||
+ | |||
+ | We can see from Table {{ref> | ||
+ | |||
+ | What if our sample size were bigger? Let's say that instead of 20 offspring, we looked at 40 offspring and obtained R=6 and NR=34. At θ=0.15, we can calculate that LOD=5.9. This puts us over the threshold of 3.3. Based on this data, we can conclude that $m$ and $s$ are linked, and the strongest support of the data suggest they are 15 map units apart. | ||
+ | |||
+ | ==== LOD score formula for unknown phase ==== | ||
+ | |||
+ | In mouse experiments, | ||
+ | |||
+ | <figure Fig4> | ||
+ | {{ : | ||
+ | < | ||
+ | placeholder | ||
+ | </ | ||
+ | </ | ||
+ | |||
+ | In this example, a family with 5 children is affected by an autosomal dominant mutation $D$. We are testing linkage with marker $m$, of which there are two alleles, $M$ and $m$. $m$ is known to be on the same chromosome as $D$ but it is unknown if they are linked. The genotype of the mother (individual 2) is known to be $\frac{a | ||
+ | |||
+ | <table Tab6> | ||
+ | <columns 100% *100%*> | ||
+ | |||
+ | ^ individual | ||
+ | ^ inferred %%genotype%% | ||
+ | ^ paternal %%chromosome%% | ||
+ | ^ if phase 1 | NR | R | NR | NR | NR | | ||
+ | ^ if phase 2 | R | NR | R | R | R | | ||
+ | ^ maternal %%chromosome%% | ||
+ | </ | ||
+ | < | ||
+ | placeholder. note that the genotype of the $m$ locus can be determined by PCR, whereas the genotype of the $D$ locus is inferred by phenotype. | ||
+ | </ | ||
+ | </ | ||
+ | |||
+ | We first define some terms: | ||
+ | |||
+ | * T = total number of informative chromosomes; | ||
+ | * R1 = number of recombinant chromosomes in phase 1 | ||
+ | * NR1 = number of non-recombinant chromosomes in phase 1 | ||
+ | * R2 = number of recombinant chromosomes in phase 2 | ||
+ | * NR2 = number of non-recombinant chromosomes in phase 2 | ||
+ | * θ = recombination fraction (i.e., map distance) between $m$ and $D$ | ||
+ | |||
+ | As before, we define the LOD score as: | ||
+ | |||
+ | $$ \begin{aligned}\text{LOD score} &= \log{\frac{\text{probability of observed pedigree data given } 0< | ||
+ | &= \log{\frac{\text{probability}(\theta)}{\text{probability}(\frac{1}{2})}}\end{aligned} | ||
+ | $$ | ||
+ | |||
+ | Since we don't know what the phase is, we must calculate $\text{probabilty}(\theta)$ for both phases: | ||
+ | |||
+ | $$\text{probabilty}(\theta)_1=\theta^{R1} \cdot (1-\theta)^{NR1} \\ | ||
+ | \text{probabilty}(\theta)_2=\theta^{R2} \cdot (1-\theta)^{NR2} | ||
+ | |||
+ | And since phase 1 and phase 2 are equally likely, we take the average of both for the LOD score: | ||
+ | |||
+ | $$ \begin{aligned} \text{LOD} & | ||
+ | & | ||
+ | |||
+ | In this example, there are a total of 5 informative chromosomes (R1=1, NR1=4; R2=4, NR2=1). Using the formula, we can calculate that at θ=0.25 we get a max LOD score of 0.25 - this does not cross the threshold of LOD=3.3 and therefore is not evidence of linkage. When the phase is unknown, LOD scores will be substantially lower than if the phase were known. If we knew the phase in our example to be phase 1, we could calculate at θ=0.25, LOD=0.403. That's still not significant but it's higher. | ||
+ | |||
+ | One last important point on LOD scores is that they are additive. This is because probabilities (odds) are multiplicative, | ||
+ | |||
+ |
chapter_22.1732577913.txt.gz · Last modified: 2024/11/25 15:38 by mike