TNBGGA: The No Bullshit Guide to Genetic Analysis

This is an old revision of the document!

In earlier chapters, we skipped over statistics when evaluating linkageplugin-autotooltip__default plugin-autotooltip_bigLinkage: two loci are linked to each other if they are less than 50 m.u. apart. Two loci are unlinked if they are either (1) greater than 50 m.u. apart on the same chromosome, or; (2) are on separate chromosomes.. For instance, in Chapter 03, we gave the example of a test crossplugin-autotooltip__default plugin-autotooltip_bigTest cross: a genetic cross devised by Gregor Mendel that allows a researcher to easily determine the genotype of an individual that appears wildtype but has an unknown genotype. This assumes you have a tester strain readily available. See Chapter 03. between $shi$ and $vg$:

$\frac{shi}{+} \cdot \frac{+}{vg}$ (F1 progenyplugin-autotooltip__defaultProgeny: a synonym for offspring.) $\times \frac{shi}{shi} \cdot \frac{vg}{vg}$ (testerplugin-autotooltip__default plugin-autotooltip_bigTester: a strain that is homozygous mutant at $n$ different loci that is used in a test cross. See Chapter 03.)

Figure 1: A test crossplugin-autotooltip__default plugin-autotooltip_bigTest cross: a genetic cross devised by Gregor Mendel that allows a researcher to easily determine the genotype of an individual that appears wildtype but has an unknown genotype. This assumes you have a tester strain readily available. See Chapter 03. between $shi$ and $vg$ mutantsplugin-autotooltip__default plugin-autotooltip_bigMutant: an individual that has a different phenotype than wildtype and likely contains one more mutations that cause this difference.. This is reproduction of Chapter 03 Figure 9.

The possible phenotypesplugin-autotooltip__default plugin-autotooltip_bigPhenotype: an observable feature or property of an organism. from this cross are:

not paralyzed, normal wings
not paralyzed, vestigial wings
paralyzed, normal wings
paralyzed, vestigial wings

We said in Chapter 03 that if the phenotypicplugin-autotooltip__default plugin-autotooltip_bigPhenotype: an observable feature or property of an organism. ratio of the offspring is 1:1:1:1, this meant that $shi$ and $vg$ are unlinkedplugin-autotooltip__default plugin-autotooltip_bigLinkage: two loci are linked to each other if they are less than 50 m.u. apart. Two loci are unlinked if they are either (1) greater than 50 m.u. apart on the same chromosome, or; (2) are on separate chromosomes.. Let's hypothetically say that you looked at 10,000 F2 offspring from a test crossplugin-autotooltip__default plugin-autotooltip_bigTest cross: a genetic cross devised by Gregor Mendel that allows a researcher to easily determine the genotype of an individual that appears wildtype but has an unknown genotype. This assumes you have a tester strain readily available. See Chapter 03., and obtained exactly 2,500 offspring of each phenotypeplugin-autotooltip__default plugin-autotooltip_bigPhenotype: an observable feature or property of an organism. combination. In this case, the conclusion is probably pretty clear; the ratio is 1:1:1:1. However, what if you just looked at just 100 F2 offspring and obtained a ratio of 20:23:30:27? That's kind of close to 1:1:1:1, but just how close is it?

In statistics, we assume that there is a “true” value for something you are attempting to measure, but because taking measurements in the real world is an imprecise process, there is sampling error in your data that deviates from the “true” value. Statistical tests let you estimate the probability that your measurement deviates from the “true” value because of sampling error. We first discuss the χ² test commonly used in genetics. We will use the test crossplugin-autotooltip__default plugin-autotooltip_bigTest cross: a genetic cross devised by Gregor Mendel that allows a researcher to easily determine the genotype of an individual that appears wildtype but has an unknown genotype. This assumes you have a tester strain readily available. See Chapter 03. between $shi$ and $vg$ as out example.

The $\chi^2$ test

The χ² test¹⁾ is a statistical test for categorical data. Examples of categorical data are biological sex (male or female), political affiliation (Democrat or Republican), species (cat or dog), etc. It is different from numerical data (e.g., the heights of people in a population, weights of fruits from a tree, speed of cars on the freeway). Phenotypesplugin-autotooltip__default plugin-autotooltip_bigPhenotype: an observable feature or property of an organism. can often be considered as categorical data. For instance, in the case of $shi$, we can say a fruit fly is either paralyzed or not paralyzed; in the case of $vg$, we can say a fruit either has vestigial wings or normal wings.

Using the χ² test, we can ask the question: what is the probability that the 20:23:30:27 ratio we observed for the $shi$ $vg$ test crossplugin-autotooltip__default plugin-autotooltip_bigTest cross: a genetic cross devised by Gregor Mendel that allows a researcher to easily determine the genotype of an individual that appears wildtype but has an unknown genotype. This assumes you have a tester strain readily available. See Chapter 03. is different from 1:1:1:1 is due to sampling error? To perform the χ² test, we first create a table of all the relevant information called a contingency table:

phenotypeplugin-autotooltip__default plugin-autotooltip_bigPhenotype: an observable feature or property of an organism.	observed no. of offspring ($O$)	expected no. of offspring if unlinkedplugin-autotooltip__default plugin-autotooltip_bigLinkage: two loci are linked to each other if they are less than 50 m.u. apart. Two loci are unlinked if they are either (1) greater than 50 m.u. apart on the same chromosome, or; (2) are on separate chromosomes. ($E$)	$\frac{(O-E)^2}{E}$
not paralyzed, normal wings	20	$\frac{1}{1+1+1+1}\cdot (20+30+23+27)=25$	$\frac{(20-25)^2}{25}=1$
not paralyzed, vestigial wings	23	$\frac{1}{1+1+1+1}\cdot (20+30+23+27)=25$	$\frac{(23-25)^2}{25}=0.16$
paralyzed, normal wings	30	$\frac{1}{1+1+1+1}\cdot (20+30+23+27)=25$	$\frac{(30-25)^2}{25}=1$
paralyzed, vestigial wings	27	$\frac{1}{1+1+1+1}\cdot (20+30+23+27)=25$	$\frac{(27-25)^2}{25}=0.16$
total	100	100	$ \chi^2=\sum{\frac{(O-E)^2}{E}}=2.32 $

Table 1: Contingency table. PLACEHOLDER

In the contingency table (Table 1), the column “observed number of offspring ($O$)” is simply the data you observed. The column “expected number of offspring if unlinkedplugin-autotooltip__default plugin-autotooltip_bigLinkage: two loci are linked to each other if they are less than 50 m.u. apart. Two loci are unlinked if they are either (1) greater than 50 m.u. apart on the same chromosome, or; (2) are on separate chromosomes. ($E$)” looks at the total sample size of your experiment (in this case, $n=100$) multiplied by the expected fraction of each genotypeplugin-autotooltip__default plugin-autotooltip_bigGenotype: the combination of alleles within an organism or strain. When used as a verb, it means to determine the genotype experimentally.. The expectation is that, if $shi$ and $vg$ are unlinkedplugin-autotooltip__default plugin-autotooltip_bigLinkage: two loci are linked to each other if they are less than 50 m.u. apart. Two loci are unlinked if they are either (1) greater than 50 m.u. apart on the same chromosome, or; (2) are on separate chromosomes., we should see a 1:1:1:1 ratio of the four possible F2 phenotypesplugin-autotooltip__default plugin-autotooltip_bigPhenotype: an observable feature or property of an organism.; therefore, we should see $\frac{1}{4} \times 100$ offspring for each type. The next step is calculating the difference between $O$ and $E$. Since we don't care whether $O$ is larger or smaller than $E$ (we just want to know the magnitude of the difference), we square the difference to “get rid” of any potential negative values. We then divide by $E$ to normalize the differences based on the expected sample size of each genotypeplugin-autotooltip__default plugin-autotooltip_bigGenotype: the combination of alleles within an organism or strain. When used as a verb, it means to determine the genotype experimentally.. Finally, we add up the normalized values for each genotypeplugin-autotooltip__default plugin-autotooltip_bigGenotype: the combination of alleles within an organism or strain. When used as a verb, it means to determine the genotype experimentally. - this sum is called the χ² statistic. The χ² statistic lets us calculate a very useful number called the p value.

The p value is the probability that our observed data deviates from a “perfect” measurement due to sampling error. The smaller the p value, the more likely the deviation from a “perfect” measurement of 1:1:1:1 is due to sampling error - in other words, a small p value means there is strong statistical support for the hypothesis that $shi$ and $vg$ are unlinkedplugin-autotooltip__default plugin-autotooltip_bigLinkage: two loci are linked to each other if they are less than 50 m.u. apart. Two loci are unlinked if they are either (1) greater than 50 m.u. apart on the same chromosome, or; (2) are on separate chromosomes.. p values can be determined using a probability distribution called a χ² distribution. Most of the time you are not asked to calculate the p value from scratch (this uses advanced statistics and calculus); instead, you can look up the p value based on your χ² statistic and another variable called degrees of freedom, which is generally defined as “the number of categories of data minus 1”. However, inn our case, there is actually only one degree of freedom in this case, because (based on our understanding of meiosisplugin-autotooltip__default plugin-autotooltip_bigMeiosis: a process involving two sequential cell divisions that usually produces four gametes (reproductive cells such as sperm or eggs).) determining the frequency of one phenotypesplugin-autotooltip__default plugin-autotooltip_bigPhenotype: an observable feature or property of an organism. in principle determines the frequencies of the other three phenotypesplugin-autotooltip__default plugin-autotooltip_bigPhenotype: an observable feature or property of an organism. regardless of whether the two mutationsplugin-autotooltip__default plugin-autotooltip_bigMutation: a change in the DNA of a gene that results in a change of phenotype compared to a reference wildtype allele. See also: mutant. are linkedplugin-autotooltip__default plugin-autotooltip_bigLinkage: two loci are linked to each other if they are less than 50 m.u. apart. Two loci are unlinked if they are either (1) greater than 50 m.u. apart on the same chromosome, or; (2) are on separate chromosomes. or not. We next consult a χ² table (Table 2):

degrees of freedom	$p$ value
degrees of freedom	0.990	0.975	0.950	0.900	0.100	0.050	0.025	0.010
1	0.000	0.001	0.004	0.016	2.706	3.841	5.042	6.635
2	0.020	0.051	0.103	0.211	4.605	5.991	7.378	9.210
3	0.115	0.216	0.352	0.584	6.251	7.815	9.348	11.34
4	0.297	0.484	0.711	1.046	7.779	9.488	11.14	13.28
5	0.554	0.831	1.145	1.610	9.236	11.07	12.83	15.09
6	0.872	1.237	1.635	2.204	10.64	12.59	14.45	16.81
7	1.239	1.690	2.167	2.833	12.02	14.07	16.01	18.48
8	1.646	2.180	2.733	3.490	13.36	15.51	17.54	20.09
9	2.088	2.700	3.325	4.168	14.68	16.92	19.02	21.67
10	2.558	3.247	3.940	4.865	15.99	18.31	20.48	23.21

Table 2: χ² probability distribution table.

We look at the row for degrees of freedom = 3, and see that our χ² statistic from Table 1 is 2.32, which falls between the columns for the p values of 0.900 and 0.100. Although we cannot determine an exact p value from the table, we know that the p value must be 0.1<p<0.9. Generally speaking, most scientists have decided that p<0.05 as a standard for statistical significance (this is an arbitrary choice and in some cases can be problematic; see below). Therefore, the statistics here suggest that it is pretty likely that the reason our observed data deviate from the “perfect” measurement is not due to experimental error. Another more intuitive way to say this is that there is no statistical evidence (at least within our experiment) to show that $shi$ and $vg$ are linkedplugin-autotooltip__default plugin-autotooltip_bigLinkage: two loci are linked to each other if they are less than 50 m.u. apart. Two loci are unlinked if they are either (1) greater than 50 m.u. apart on the same chromosome, or; (2) are on separate chromosomes..

The multiple testing problem

Let's now apply the idea of the χ² test to a more human-like model, the mouse. One limitation with working with mice (just as with humans) is sample size - it is not as easy to breed large numbers of mice in genetic crosses (although it can be done with enough resources).

Let's say we have a $mutant$ ($m$) mouse with phenotypeplugin-autotooltip__default plugin-autotooltip_bigPhenotype: an observable feature or property of an organism. m (wildplugin-autotooltip__default plugin-autotooltip_bigWild: refers to organisms that grow in wild populations. Not to be confused with wildtype. type mice are +). $m$ is a recessiveplugin-autotooltip__default plugin-autotooltip_bigRecessive: used to describe an allele, usually in comparison to wildtype. Recessive alleles do not exhibit their phenotype when combined with a wildtype allele. mutationplugin-autotooltip__default plugin-autotooltip_bigMutation: a change in the DNA of a gene that results in a change of phenotype compared to a reference wildtype allele. See also: mutant.. How could you map $m$? One thing you could do is to randomly pick a genetic markerplugin-autotooltip__default plugin-autotooltip_bigMarker: an allele of a gene that provides an easily observable phenotype. Markers are usually cloned or least well mapped. They are used as genetic landmarks in various genetic experiments. In some cases, markers do not have easily observable phenotypes and can only be detected using molecular methods (e.g., SNPs or SSRs). somewhere in the genomeplugin-autotooltip__default plugin-autotooltip_bigGenome: a dataset that contains all DNA information of an organism. Most of the time, this also includes annotation and curation of that information, e.g., the names, locations, and functions of genes within the genome. As an adjective (“genomic”), this usually is used in the context of and ask the question, “is $m$ linkedplugin-autotooltip__default plugin-autotooltip_bigLinkage: two loci are linked to each other if they are less than 50 m.u. apart. Two loci are unlinked if they are either (1) greater than 50 m.u. apart on the same chromosome, or; (2) are on separate chromosomes. to this markerplugin-autotooltip__default plugin-autotooltip_bigMarker: an allele of a gene that provides an easily observable phenotype. Markers are usually cloned or least well mapped. They are used as genetic landmarks in various genetic experiments. In some cases, markers do not have easily observable phenotypes and can only be detected using molecular methods (e.g., SNPs or SSRs).?” Let's call this markerplugin-autotooltip__default plugin-autotooltip_bigMarker: an allele of a gene that provides an easily observable phenotype. Markers are usually cloned or least well mapped. They are used as genetic landmarks in various genetic experiments. In some cases, markers do not have easily observable phenotypes and can only be detected using molecular methods (e.g., SNPs or SSRs). $s$. Let's say $s$ is a single nucleotideplugin-autotooltip__default plugin-autotooltip_bigNucleotide: molecules that are polymerized to form nucleic acids (DNA or RNA). Includes dNTPs and NTPs. polymorphism (SNP) in the $m$ mouse. The cross to map $m$ might look something like this:

$$ P: \frac{m}{m}\cdot \frac{s}{s} \times \frac{+}{+}\cdot \frac{+}{+} $$

This generates heterozygotesplugin-autotooltip__default plugin-autotooltip_bigHeterozygous: a state for a diploid organism wherein the two alleles for a gene are different from each other., which we then backcross to the parent (equivalent to a test crossplugin-autotooltip__default plugin-autotooltip_bigTest cross: a genetic cross devised by Gregor Mendel that allows a researcher to easily determine the genotype of an individual that appears wildtype but has an unknown genotype. This assumes you have a tester strain readily available. See Chapter 03.):

$$ F1: \frac{m}{+} \cdot \frac{s}{+} \times \frac{m}{m}\cdot \frac{s}{s} $$

Among the F2 offspring, we would get the following phenotypesplugin-autotooltip__default plugin-autotooltip_bigPhenotype: an observable feature or property of an organism./genotypesplugin-autotooltip__default plugin-autotooltip_bigGenotype: the combination of alleles within an organism or strain. When used as a verb, it means to determine the genotype experimentally.:

phenotypeplugin-autotooltip__default plugin-autotooltip_bigPhenotype: an observable feature or property of an organism./genotypeplugin-autotooltip__default plugin-autotooltip_bigGenotype: the combination of alleles within an organism or strain. When used as a verb, it means to determine the genotype experimentally.	+, $\frac{s}{+}$	m, $\frac{s}{s}$	+, $\frac{s}{s}$	m, $\frac{s}{+}$
expected	5	5	5	5
observed	10	7	2	1

Table 3: placeholder. note here that expected is the same as nullplugin-autotooltip__default plugin-autotooltip_bigAmorphic mutation: assuming a wildtype allele has 100% gene function, an amorphic mutation is a mutant allele that has 0% gene function. Also called a null mutation or a complete loss of function mutation. hypothesis. also note that SNPs are determined by PCRplugin-autotooltip__default plugin-autotooltip_bigPolymerase chain reaction (PCR): An experimental technique invented by Kary Mullis used to exponentially amplify DNA in vitro. PCR made obtaining large quantities of DNA for analysis much faster and easier than using traditional cloning methods.. note the smaller sample size compared to the drosophila exp.

We can calculate the χ² statistic as follows:

$$ \begin{aligned} \chi^2 &= \sum{ \frac{(O-E)^2}{E}} \\ &= \frac{(5-10)^2}{5} + \frac{(5-7)^2}{5} + \frac{(5-2)^2}{} + \frac{(5-1)^2}{5} \\ &= 10.8 \end{aligned}$$

Using the contingency table shown in Table 2, we can see that $p$<0.01 for χ²=10.8. In fact, with a contingency table that shows more columns to the right, we would see that 0.001<$p$<0.002. Since the typical cutoff for $p$ values for statistical significance is 0.05, based on the χ² test we can conclude that $m$ is linkedplugin-autotooltip__default plugin-autotooltip_bigLinkage: two loci are linked to each other if they are less than 50 m.u. apart. Two loci are unlinked if they are either (1) greater than 50 m.u. apart on the same chromosome, or; (2) are on separate chromosomes. to $s$. Is this conclusive evidence, especially based on the relatively small sample size?

To model this, let's imagine an infinitely large bag containing 4 different kinds of balls, each present at an equal frequency, labeled as A, B, C, and D. If we were to reach into the bag and pull out 20 balls, the expected result we would get would be 5 A, 5 B, 5 C, and 5 D balls. However, it's unlikely we would get exactly 5 balls each - there would be some sampling error. For instance, 4 A, 4 B, 6 C, and 6 D balls being drawn is likely different from the expected result of 5 each is probably just due to sampling error. The $p$ value we calculated from the data in Table 2 means that the likelihood we would draw 10 A, 7 B, 2 C, and 1 D balls is around 0.001, or 0.1%. Those sound like pretty good odds for NOT drawing 10 A, 7B, 2 C, and 1 D, until you start repeating the experiment.

Let's put this into some numerical perspective. If the probability that each time we draw 20 balls we'll get a result that is a false positive (that is, a result that deviates too far from the expected value as in the example) is 0.001, this means that the probability we'll get a result that is relatively close to the expected value is 0.999. If we draw 20 balls $n$ times (each draw is independent, since the bag of balls is infinite in size), the likelihood that all $n$ draws will give results close to expected is 0.999ⁿ. This means that in $n$ draws, the likelihood that at least one draw will give a false positive is 1-0.999ⁿ. If we set this probability as 10%, or 0.1, we can solve for $n$:

$$1-0.999^n=0.1 \\ 0.999^n=0.9\\ n\log{0.999}=\log{0.9}\\ n=\frac{\log{}0.9}{\log{0.999}}=105.3 $$

In other words, if we were to draw 20 balls around a hundred times, then on average 10 samples would give us results that would deviate from the expected results enough to fool us.

Now let's return to mappingplugin-autotooltip__default plugin-autotooltip_bigGenetic mapping: a term describing a variety of different experimental approaches used to determine the physical locations of genes on chromosomes. $m$. The SNP $s$ that we used was randomly chosen. If SNP density between two mouse strains were roughly 1 SNP/1000 bp of DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth., and the mouse genomeplugin-autotooltip__default plugin-autotooltip_bigGenome: a dataset that contains all DNA information of an organism. Most of the time, this also includes annotation and curation of that information, e.g., the names, locations, and functions of genes within the genome. As an adjective (“genomic”), this usually is used in the context of is ~10⁹ total, then this means that if we were to look at all SNPs between the $p$ mouse and wildplugin-autotooltip__default plugin-autotooltip_bigWild: refers to organisms that grow in wild populations. Not to be confused with wildtype. type, we might be looking looking at a million SNPs! For mappingplugin-autotooltip__default plugin-autotooltip_bigGenetic mapping: a term describing a variety of different experimental approaches used to determine the physical locations of genes on chromosomes. a mouse mutantplugin-autotooltip__default plugin-autotooltip_bigMutant: an individual that has a different phenotype than wildtype and likely contains one more mutations that cause this difference., we wouldn't want to limit our mappingplugin-autotooltip__default plugin-autotooltip_bigGenetic mapping: a term describing a variety of different experimental approaches used to determine the physical locations of genes on chromosomes. experiment to a single SNP like $s$; we would probably want to look at many (or possibly even all) SNPs throughout the genomeplugin-autotooltip__default plugin-autotooltip_bigGenome: a dataset that contains all DNA information of an organism. Most of the time, this also includes annotation and curation of that information, e.g., the names, locations, and functions of genes within the genome. As an adjective (“genomic”), this usually is used in the context of. But each time we compare linkageplugin-autotooltip__default plugin-autotooltip_bigLinkage: two loci are linked to each other if they are less than 50 m.u. apart. Two loci are unlinked if they are either (1) greater than 50 m.u. apart on the same chromosome, or; (2) are on separate chromosomes. between $m$ and a SNP we are essentially doing the same thing as reaching into our bag and pulling out 4 balls, and the likelihood we'll get a false positive is ~0.1%. This means that the more SNPs we test for linkageplugin-autotooltip__default plugin-autotooltip_bigLinkage: two loci are linked to each other if they are less than 50 m.u. apart. Two loci are unlinked if they are either (1) greater than 50 m.u. apart on the same chromosome, or; (2) are on separate chromosomes., the greater the chance that we will get data that “shows linkageplugin-autotooltip__default plugin-autotooltip_bigLinkage: two loci are linked to each other if they are less than 50 m.u. apart. Two loci are unlinked if they are either (1) greater than 50 m.u. apart on the same chromosome, or; (2) are on separate chromosomes.” but is really an unlinkedplugin-autotooltip__default plugin-autotooltip_bigLinkage: two loci are linked to each other if they are less than 50 m.u. apart. Two loci are unlinked if they are either (1) greater than 50 m.u. apart on the same chromosome, or; (2) are on separate chromosomes. SNP due to sampling error.

This conundrum is known as the multiple testing problem. We can partially alleviate the multiple testing problem by increasing the sample size of the experiment and getting lower $p$ values for SNPs that are truly linkedplugin-autotooltip__default plugin-autotooltip_bigLinkage: two loci are linked to each other if they are less than 50 m.u. apart. Two loci are unlinked if they are either (1) greater than 50 m.u. apart on the same chromosome, or; (2) are on separate chromosomes.. The $p$ value threshold for statistical significance must be substantially lower than the “standard” 0.05 if we doing a genomeplugin-autotooltip__default plugin-autotooltip_bigGenome: a dataset that contains all DNA information of an organism. Most of the time, this also includes annotation and curation of that information, e.g., the names, locations, and functions of genes within the genome. As an adjective (“genomic”), this usually is used in the context of-wide search for linkageplugin-autotooltip__default plugin-autotooltip_bigLinkage: two loci are linked to each other if they are less than 50 m.u. apart. Two loci are unlinked if they are either (1) greater than 50 m.u. apart on the same chromosome, or; (2) are on separate chromosomes. (this is a multiple testing experiment by definition) - the general consensus among geneticists is that the cutoff should be at least $p$<0.0001 instead.

Mapping human mutations using LOD scores

Mappingplugin-autotooltip__default plugin-autotooltip_bigGenetic mapping: a term describing a variety of different experimental approaches used to determine the physical locations of genes on chromosomes. human mutationsplugin-autotooltip__default plugin-autotooltip_bigMutation: a change in the DNA of a gene that results in a change of phenotype compared to a reference wildtype allele. See also: mutant. is especially challenging, because (1) we cannot control human crosses (at least not in any ethical way), and (2) sample sizes are usually small. We use a tool called the logarithm of odds (LOD) score to help us. To understand LOD scores and how to map human genesplugin-autotooltip__default plugin-autotooltip_bigGene: read Chapters 02, 03, 04, 05, and 06 for a definition of gene :-), we first need to look at several concepts. By way of example, let's imagine an autosomalplugin-autotooltip__default plugin-autotooltip_bigAutosome: any chromosome that is not a sex chromosome. dominantplugin-autotooltip__default plugin-autotooltip_bigDominant: used to describe an allele, usually in comparison to wildtype. Dominant alleles will express their phenotype when combined with a wildtype allele. human disease for which the molecular identity of the disease geneplugin-autotooltip__default plugin-autotooltip_bigGene: read Chapters 02, 03, 04, 05, and 06 for a definition of gene :-) is unknown and is caused by alleleplugin-autotooltip__default plugin-autotooltip_bigAllele: a version of a gene. Alleles of a gene are different if they have differences in their DNA sequence. $D$. The normal (non-disease) alleleplugin-autotooltip__default plugin-autotooltip_bigAllele: a version of a gene. Alleles of a gene are different if they have differences in their DNA sequence. is $d$. Let's say an affected male has an affected male child with an unaffected female. The pedigree would look like this:

Concept 1: phase

Let's say we have identified a SNP markerplugin-autotooltip__default plugin-autotooltip_bigMarker: an allele of a gene that provides an easily observable phenotype. Markers are usually cloned or least well mapped. They are used as genetic landmarks in various genetic experiments. In some cases, markers do not have easily observable phenotypes and can only be detected using molecular methods (e.g., SNPs or SSRs). $s$ that is linkedplugin-autotooltip__default plugin-autotooltip_bigLinkage: two loci are linked to each other if they are less than 50 m.u. apart. Two loci are unlinked if they are either (1) greater than 50 m.u. apart on the same chromosome, or; (2) are on separate chromosomes. to $D$ and has two allelesplugin-autotooltip__default plugin-autotooltip_bigAllele: a version of a gene. Alleles of a gene are different if they have differences in their DNA sequence. in the population: $s$ and $S$. Can we deduce what the genotypeplugin-autotooltip__default plugin-autotooltip_bigGenotype: the combination of alleles within an organism or strain. When used as a verb, it means to determine the genotype experimentally. is of the child for both the $d$ and $s$ lociplugin-autotooltip__default plugin-autotooltip_bigLocus (plural form: loci): a physical location of a gene; often used as a synonym for a gene.? First of all, we can easily determine the genotypesplugin-autotooltip__default plugin-autotooltip_bigGenotype: the combination of alleles within an organism or strain. When used as a verb, it means to determine the genotype experimentally. of the SNP alleleplugin-autotooltip__default plugin-autotooltip_bigAllele: a version of a gene. Alleles of a gene are different if they have differences in their DNA sequence. in both parents and the child by using PCRplugin-autotooltip__default plugin-autotooltip_bigPolymerase chain reaction (PCR): An experimental technique invented by Kary Mullis used to exponentially amplify DNA in vitro. PCR made obtaining large quantities of DNA for analysis much faster and easier than using traditional cloning methods. and DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. sequencingplugin-autotooltip__default plugin-autotooltip_bigSequencing: the procedure used to determine the sequence of a biological polymer such as DNA, RNA, or protein. Although there are indeed biochemical techniques that can be used to directly sequence RNA or protein, these methods are almost never used in modern molecular genetics research - instead, RNA. We discover that the father is homozygousplugin-autotooltip__default plugin-autotooltip_bigHomozygous: a state for a diploid organism wherein the two alleles for a gene are identical to each other. $\frac{S}{S}$, and both the mother and child are heterozygousplugin-autotooltip__default plugin-autotooltip_bigHeterozygous: a state for a diploid organism wherein the two alleles for a gene are different from each other. $\frac{S}{s}$.

Next, since autosomalplugin-autotooltip__default plugin-autotooltip_bigAutosome: any chromosome that is not a sex chromosome. dominantplugin-autotooltip__default plugin-autotooltip_bigDominant: used to describe an allele, usually in comparison to wildtype. Dominant alleles will express their phenotype when combined with a wildtype allele. traits are rare, we can assume that the father's genotypeplugin-autotooltip__default plugin-autotooltip_bigGenotype: the combination of alleles within an organism or strain. When used as a verb, it means to determine the genotype experimentally. with regard to the $d$ geneplugin-autotooltip__default plugin-autotooltip_bigGene: read Chapters 02, 03, 04, 05, and 06 for a definition of gene :-) is $\frac{D}{d}$. Since we know the father's genotypeplugin-autotooltip__default plugin-autotooltip_bigGenotype: the combination of alleles within an organism or strain. When used as a verb, it means to determine the genotype experimentally. for the $s$ SNP lociplugin-autotooltip__default plugin-autotooltip_bigLocus (plural form: loci): a physical location of a gene; often used as a synonym for a gene. is $\frac{S}{S}$, this means that, combined, the father's genotypeplugin-autotooltip__default plugin-autotooltip_bigGenotype: the combination of alleles within an organism or strain. When used as a verb, it means to determine the genotype experimentally. must be $\frac{S \; D}{S \; d}$. Since the mother is not affected by the disease and the disease alleleplugin-autotooltip__default plugin-autotooltip_bigAllele: a version of a gene. Alleles of a gene are different if they have differences in their DNA sequence. is dominantplugin-autotooltip__default plugin-autotooltip_bigDominant: used to describe an allele, usually in comparison to wildtype. Dominant alleles will express their phenotype when combined with a wildtype allele., the mother's genotypeplugin-autotooltip__default plugin-autotooltip_bigGenotype: the combination of alleles within an organism or strain. When used as a verb, it means to determine the genotype experimentally. must be $\frac{S \; d}{s \; d}$. Given that the child is affected, their genotypeplugin-autotooltip__default plugin-autotooltip_bigGenotype: the combination of alleles within an organism or strain. When used as a verb, it means to determine the genotype experimentally. therefore must be $\frac{S \; D}{s \; d}$. Given what we know about the parents, the child's genotypeplugin-autotooltip__default plugin-autotooltip_bigGenotype: the combination of alleles within an organism or strain. When used as a verb, it means to determine the genotype experimentally. cannot be $\frac{S \; d}{s \; D}$. In this case, the disease alleleplugin-autotooltip__default plugin-autotooltip_bigAllele: a version of a gene. Alleles of a gene are different if they have differences in their DNA sequence. $D$ must be on the same chromosomeplugin-autotooltip__default plugin-autotooltip_bigChromosome: a structure that organizes dsDNA in a cell through interactions with various DNA binding proteins. as markerplugin-autotooltip__default plugin-autotooltip_bigMarker: an allele of a gene that provides an easily observable phenotype. Markers are usually cloned or least well mapped. They are used as genetic landmarks in various genetic experiments. In some cases, markers do not have easily observable phenotypes and can only be detected using molecular methods (e.g., SNPs or SSRs). $A$ in the child, and the normal alleleplugin-autotooltip__default plugin-autotooltip_bigAllele: a version of a gene. Alleles of a gene are different if they have differences in their DNA sequence. $d$ must be on the same chromosomeplugin-autotooltip__default plugin-autotooltip_bigChromosome: a structure that organizes dsDNA in a cell through interactions with various DNA binding proteins. as the ; we say that the phase is known for both chromosomeplugin-autotooltip__default plugin-autotooltip_bigChromosome: a structure that organizes dsDNA in a cell through interactions with various DNA binding proteins. in the child.

Concept 2: informativeness

Now let us consider a different scenario to illustrate our second concept of informativeness. Let's say that we know the father's genotypeplugin-autotooltip__default plugin-autotooltip_bigGenotype: the combination of alleles within an organism or strain. When used as a verb, it means to determine the genotype experimentally. is $\frac{S \; D}{s \; d}$ and the mother's genotypeplugin-autotooltip__default plugin-autotooltip_bigGenotype: the combination of alleles within an organism or strain. When used as a verb, it means to determine the genotype experimentally. is $\frac{S \; d}{s \; d}$. We discover that the affected child is homozygousplugin-autotooltip__default plugin-autotooltip_bigHomozygous: a state for a diploid organism wherein the two alleles for a gene are identical to each other. $\frac{S}{S}$. In this case, we can definitely conclude that the child inherited a $S \; D$ chromosomeplugin-autotooltip__default plugin-autotooltip_bigChromosome: a structure that organizes dsDNA in a cell through interactions with various DNA binding proteins. from their father, and that there must not have been recombinationplugin-autotooltip__default plugin-autotooltip_bigRecombination: Recombination can have slightly different meanings depending on context:

* In the context of genetic crosses (usually a dihybrid cross or a test cross), recombination refers to the phenomena where the phenotype of the F2 offspring is different than either parent (P generation). $lox$ between the $S$ and $D$ lociplugin-autotooltip__default plugin-autotooltip_bigLocus (plural form: loci): a physical location of a gene; often used as a synonym for a gene.. We say that the chromosomeplugin-autotooltip__default plugin-autotooltip_bigChromosome: a structure that organizes dsDNA in a cell through interactions with various DNA binding proteins. inherited from his father is informative, because we know for certain whether it was a product of recombinationplugin-autotooltip__default plugin-autotooltip_bigRecombination: Recombination can have slightly different meanings depending on context:

* In the context of genetic crosses (usually a dihybrid cross or a test cross), recombination refers to the phenomena where the phenotype of the F2 offspring is different than either parent (P generation). $lox$ or not. By comparison, the child must have inherited a $S \; d$ chromosomeplugin-autotooltip__default plugin-autotooltip_bigChromosome: a structure that organizes dsDNA in a cell through interactions with various DNA binding proteins. from their mother. This chromosomeplugin-autotooltip__default plugin-autotooltip_bigChromosome: a structure that organizes dsDNA in a cell through interactions with various DNA binding proteins. is uninformative, because we can't tell if it is a result of recombinationplugin-autotooltip__default plugin-autotooltip_bigRecombination: Recombination can have slightly different meanings depending on context:

* In the context of genetic crosses (usually a dihybrid cross or a test cross), recombination refers to the phenomena where the phenotype of the F2 offspring is different than either parent (P generation). $lox$ between $S$ and $d$ or not. When we look at individuals in pedigrees, they might have 0, 1, or 2 informative chromosomesplugin-autotooltip__default plugin-autotooltip_bigChromosome: a structure that organizes dsDNA in a cell through interactions with various DNA binding proteins..

Let's revisit the example in Table 3. The question we want to ask is: are $m$ and $s$ linkedplugin-autotooltip__default plugin-autotooltip_bigLinkage: two loci are linked to each other if they are less than 50 m.u. apart. Two loci are unlinked if they are either (1) greater than 50 m.u. apart on the same chromosome, or; (2) are on separate chromosomes.? Let's assume that they are - in such a case, we can determine which chromosomesplugin-autotooltip__default plugin-autotooltip_bigChromosome: a structure that organizes dsDNA in a cell through interactions with various DNA binding proteins. are informative or not, and which are recombinantplugin-autotooltip__default plugin-autotooltip_bigRecombinant: (adj.) Describing something that has undergone recombination, e.g., recombinant DNA or recombinant offspring. “Non-parental” is a synonym when referring to organisms. (n.) Something that has undergone recombination, e.g., “This fly is a recombinant.”:

phenotypeplugin-autotooltip__default plugin-autotooltip_bigPhenotype: an observable feature or property of an organism./genotypeplugin-autotooltip__default plugin-autotooltip_bigGenotype: the combination of alleles within an organism or strain. When used as a verb, it means to determine the genotype experimentally.	+, $\frac{s}{+}$	m, $\frac{s}{s}$	+, $\frac{s}{s}$	m, $\frac{s}{+}$
expected	5	5	5	5
observed	10	7	2	1
paternal chromosomeplugin-autotooltip__default plugin-autotooltip_bigChromosome: a structure that organizes dsDNA in a cell through interactions with various DNA binding proteins. genotypeplugin-autotooltip__default plugin-autotooltip_bigGenotype: the combination of alleles within an organism or strain. When used as a verb, it means to determine the genotype experimentally.

¹⁾

pronounced “kai-squared test”; χ is a Greek letter that is usually Romanized as “chi” but is pronounced “kai”

← Chapter 21

Table of Contents

Chapter 23 →