This is an old revision of the document!
Table of Contents
Polymorphisms in human DNA sequences
Genomes and human genetics
The methods of genetic analysis that you have been learning are applicable to to humans. However, we need to combine these genetic principles with an understanding of the physical realities of the human genomeplugin-autotooltip__default plugin-autotooltip_bigGenome: a dataset that contains all DNA information of an organism. Most of the time, this also includes annotation and curation of that information, e.g., the names, locations, and functions of genes within the genome. As an adjective (“genomic”), this usually is used in the context of. To genetics we will add genomics - the study of entire genomesplugin-autotooltip__default plugin-autotooltip_bigGenome: a dataset that contains all DNA information of an organism. Most of the time, this also includes annotation and curation of that information, e.g., the names, locations, and functions of genes within the genome. As an adjective (“genomic”), this usually is used in the context of. Let's revisit an old table we saw in an earlier chapter, with a few additions.
|
Of course, to many people humans are the organism in which they are most interested, either for their intrinsic interest in themselves or for biomedical applications. Humans are also more difficult to study compared to the other model organisms we have discussed in this book. The human genomeplugin-autotooltip__default plugin-autotooltip_bigGenome: a dataset that contains all DNA information of an organism. Most of the time, this also includes annotation and curation of that information, e.g., the names, locations, and functions of genes within the genome. As an adjective (“genomic”), this usually is used in the context of is much larger and less geneplugin-autotooltip__default plugin-autotooltip_bigGene: read Chapters 02, 03, 04, 05, and 06 for a definition of gene :-) dense than invertebrates such as Drosophilaplugin-autotooltip__default plugin-autotooltip_bigDrosophila melanogaster: a fruit fly species used in genetics research. or C. elegansplugin-autotooltip__default plugin-autotooltip_bigCaenorhabditis elegans: a non-parasitic non-pathogenic nematode species used as a model organism in genetics researhc.. More importantly, we cannot study human genetics by designing experiments using true-breedingplugin-autotooltip__default plugin-autotooltip_bigTrue breeding: a true breeding strain is one that has been inbred for multiple generations. We assume that the vast majority of loci are homozygous in a true breeding strain. mutantsplugin-autotooltip__default plugin-autotooltip_bigMutant: an individual that has a different phenotype than wildtype and likely contains one more mutations that cause this difference. that can generate large quantities of data to test hypotheses prospectively the way we can with model organisms; for ethical reasons we can only study human genetics retrospectively, using whatever meager quantity of data natural human populations provide. Human geneticists therefore rely on non-invasive DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. tests, genetic mappingplugin-autotooltip__default plugin-autotooltip_bigGenetic mapping: a term describing a variety of different experimental approaches used to determine the physical locations of genes on chromosomes., and statistical analyses to make connections between genotypeplugin-autotooltip__default plugin-autotooltip_bigGenotype: the combination of alleles within an organism or strain. When used as a verb, it means to determine the genotype experimentally. and phenotypeplugin-autotooltip__default plugin-autotooltip_bigPhenotype: an observable feature or property of an organism.. There are, whoever, some advantages to studying human genetics. First, humans are self-screeningplugin-autotooltip__default plugin-autotooltip_bigScreen: a screen is a process through which a researcher looks through a population of individuals in an attempt to find rare individuals with certain phenotypes, usually with no obvious way to enrich for the rare individuals. contrast to a selection.; they will self-report interesting phenotypesplugin-autotooltip__default plugin-autotooltip_bigPhenotype: an observable feature or property of an organism. (usually these are genetic diseases). Second, subtle phenotypesplugin-autotooltip__default plugin-autotooltip_bigPhenotype: an observable feature or property of an organism. may be more recognizable, as affected individuals may be able to talk directly with researchers.
Polymorphisms and mapping
When we have discussed mappingplugin-autotooltip__default plugin-autotooltip_bigGenetic mapping: a term describing a variety of different experimental approaches used to determine the physical locations of genes on chromosomes. mutationsplugin-autotooltip__default plugin-autotooltip_bigMutation: a change in the DNA of a gene that results in a change of phenotype compared to a reference wildtype allele. See also: mutant. in model organisms, map positions can be described in two ways:
- You can describe the genetic mapplugin-autotooltip__default plugin-autotooltip_bigGenetic map: a map that shows gene positions on chromosomes measured in centiMorgans. position of a mutationplugin-autotooltip__default plugin-autotooltip_bigMutation: a change in the DNA of a gene that results in a change of phenotype compared to a reference wildtype allele. See also: mutant. relative to other known markerplugin-autotooltip__default plugin-autotooltip_bigMarker: an allele of a gene that provides an easily observable phenotype. Markers are usually cloned or least well mapped. They are used as genetic landmarks in various genetic experiments. In some cases, markers do not have easily observable phenotypes and can only be detected using molecular methods (e.g., SNPs or SSRs). mutationsplugin-autotooltip__default plugin-autotooltip_bigMutation: a change in the DNA of a gene that results in a change of phenotype compared to a reference wildtype allele. See also: mutant., and map distances can be measured using frequency of recombinationplugin-autotooltip__default plugin-autotooltip_bigRecombination: Recombination can have slightly different meanings depending on context:
* In the context of genetic crosses (usually a dihybrid cross or a test cross), recombination refers to the phenomena where the phenotype of the F2 offspring is different than either parent (P generation). $lox$ (i.e., “$white$ is 1.5 m.u. from $yellow$”); or - You can describe the physical location of a mutationplugin-autotooltip__default plugin-autotooltip_bigMutation: a change in the DNA of a gene that results in a change of phenotype compared to a reference wildtype allele. See also: mutant. using coordinates on DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. (i.e., “individuals that are homozygousplugin-autotooltip__default plugin-autotooltip_bigHomozygous: a state for a diploid organism wherein the two alleles for a gene are identical to each other. for a recessiveplugin-autotooltip__default plugin-autotooltip_bigRecessive: used to describe an allele, usually in comparison to wildtype. Recessive alleles do not exhibit their phenotype when combined with a wildtype allele. disease alleleplugin-autotooltip__default plugin-autotooltip_bigAllele: a version of a gene. Alleles of a gene are different if they have differences in their DNA sequence. have a different DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. sequenceplugin-autotooltip__default plugin-autotooltip_bigSequence: the precise order of monomers in a polymer. In DNA, it refers to the order of G, A, T, and C nucleotides. In RNA, it refers to the order of G, A, U, and C nucleotides. In proteins, it refers to the order of amino acids. at position 3563562 on Chromosomeplugin-autotooltip__default plugin-autotooltip_bigChromosome: a structure that organizes dsDNA in a cell through interactions with various DNA binding proteins. III”).
In human genetics, we don't have visible markerplugin-autotooltip__default plugin-autotooltip_bigMarker: an allele of a gene that provides an easily observable phenotype. Markers are usually cloned or least well mapped. They are used as genetic landmarks in various genetic experiments. In some cases, markers do not have easily observable phenotypes and can only be detected using molecular methods (e.g., SNPs or SSRs). mutationsplugin-autotooltip__default plugin-autotooltip_bigMutation: a change in the DNA of a gene that results in a change of phenotype compared to a reference wildtype allele. See also: mutant. such as $white$ and $yellow$ in Drosophilaplugin-autotooltip__default plugin-autotooltip_bigDrosophila melanogaster: a fruit fly species used in genetics research., and even if we did we could not force humans to do crosses (and even if we could, we would have to wait 20 years to get an F1 generation!). Instead, we reply on DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. polymorphisms as markersplugin-autotooltip__default plugin-autotooltip_bigMarker: an allele of a gene that provides an easily observable phenotype. Markers are usually cloned or least well mapped. They are used as genetic landmarks in various genetic experiments. In some cases, markers do not have easily observable phenotypes and can only be detected using molecular methods (e.g., SNPs or SSRs).. A polymorphism is simply a difference in DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. sequenceplugin-autotooltip__default plugin-autotooltip_bigSequence: the precise order of monomers in a polymer. In DNA, it refers to the order of G, A, T, and C nucleotides. In RNA, it refers to the order of G, A, U, and C nucleotides. In proteins, it refers to the order of amino acids. at a particular location in the genomeplugin-autotooltip__default plugin-autotooltip_bigGenome: a dataset that contains all DNA information of an organism. Most of the time, this also includes annotation and curation of that information, e.g., the names, locations, and functions of genes within the genome. As an adjective (“genomic”), this usually is used in the context of between individuals in a population. These differences can be inside a geneplugin-autotooltip__default plugin-autotooltip_bigGene: read Chapters 02, 03, 04, 05, and 06 for a definition of gene :-), or in between genesplugin-autotooltip__default plugin-autotooltip_bigGene: read Chapters 02, 03, 04, 05, and 06 for a definition of gene :-). DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. polymorphisms include substitutions, duplications, deletions, etc. Since polymorphisms are defined as differences in DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. sequenceplugin-autotooltip__default plugin-autotooltip_bigSequence: the precise order of monomers in a polymer. In DNA, it refers to the order of G, A, T, and C nucleotides. In RNA, it refers to the order of G, A, U, and C nucleotides. In proteins, it refers to the order of amino acids. at a specific location, we can use the term locusplugin-autotooltip__default plugin-autotooltip_bigLocus (plural form: loci): a physical location of a gene; often used as a synonym for a gene. to describe or refer to a polymorphism even if it is not part of a geneplugin-autotooltip__default plugin-autotooltip_bigGene: read Chapters 02, 03, 04, 05, and 06 for a definition of gene :-). We can also use alleleplugin-autotooltip__default plugin-autotooltip_bigAllele: a version of a gene. Alleles of a gene are different if they have differences in their DNA sequence. to describe different versions of that locusplugin-autotooltip__default plugin-autotooltip_bigLocus (plural form: loci): a physical location of a gene; often used as a synonym for a gene.. A locusplugin-autotooltip__default plugin-autotooltip_bigLocus (plural form: loci): a physical location of a gene; often used as a synonym for a gene. is said to be polymorphic is two or more allelesplugin-autotooltip__default plugin-autotooltip_bigAllele: a version of a gene. Alleles of a gene are different if they have differences in their DNA sequence. are each present at a frequency of at least 1% in a population.
When human individuals with interesting (usually disease-related) phenotypesplugin-autotooltip__default plugin-autotooltip_bigPhenotype: an observable feature or property of an organism. are found, we can examine their family tree (i.e., their pedigree), see if there are other affected individuals, and determine which polymorphisms segregate with the phenotypeplugin-autotooltip__default plugin-autotooltip_bigPhenotype: an observable feature or property of an organism. at a greater frequency than random. In other words, we are trying to identify which polymorphisms are linkedplugin-autotooltip__default plugin-autotooltip_bigLinkage: two loci are linked to each other if they are less than 50 m.u. apart. Two loci are unlinked if they are either (1) greater than 50 m.u. apart on the same chromosome, or; (2) are on separate chromosomes. with the disease phenotypeplugin-autotooltip__default plugin-autotooltip_bigPhenotype: an observable feature or property of an organism.. Since we know the DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. coordinates of the polymorphisms, it follows that the mutationplugin-autotooltip__default plugin-autotooltip_bigMutation: a change in the DNA of a gene that results in a change of phenotype compared to a reference wildtype allele. See also: mutant. causing the disease phenotypeplugin-autotooltip__default plugin-autotooltip_bigPhenotype: an observable feature or property of an organism. must be nearby. The strength of the linkageplugin-autotooltip__default plugin-autotooltip_bigLinkage: two loci are linked to each other if they are less than 50 m.u. apart. Two loci are unlinked if they are either (1) greater than 50 m.u. apart on the same chromosome, or; (2) are on separate chromosomes. must be assessed using statistical analysis. Thus, human genetics is a collaboration between classical Mendelian genetics, molecular biologyplugin-autotooltip__default plugin-autotooltip_bigMolecular biology: the study of nucleic acids, specifically DNA and RNA., and biostatistics.
Two types of DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. polymorphisms are of particular importance in human genetics: single nucleotideplugin-autotooltip__default plugin-autotooltip_bigNucleotide: molecules that are polymerized to form nucleic acids (DNA or RNA). Includes dNTPs and NTPs. polymorphisms (SNPs) and simple sequenceplugin-autotooltip__default plugin-autotooltip_bigSequence: the precise order of monomers in a polymer. In DNA, it refers to the order of G, A, T, and C nucleotides. In RNA, it refers to the order of G, A, U, and C nucleotides. In proteins, it refers to the order of amino acids. repeats (SSRs).
Single nucleotide polymorphisms (SNPs)
A single nucleotideplugin-autotooltip__default plugin-autotooltip_bigNucleotide: molecules that are polymerized to form nucleic acids (DNA or RNA). Includes dNTPs and NTPs. polymorphism, or SNP (pronounced “snip”), simply means a polymorphism that consists of a single nucleotideplugin-autotooltip__default plugin-autotooltip_bigNucleotide: molecules that are polymerized to form nucleic acids (DNA or RNA). Includes dNTPs and NTPs.. For instance, the DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. sequenceplugin-autotooltip__default plugin-autotooltip_bigSequence: the precise order of monomers in a polymer. In DNA, it refers to the order of G, A, T, and C nucleotides. In RNA, it refers to the order of G, A, U, and C nucleotides. In proteins, it refers to the order of amino acids. at position 3563562 on chromosomeplugin-autotooltip__default plugin-autotooltip_bigChromosome: a structure that organizes dsDNA in a cell through interactions with various DNA binding proteins. III for might be a G-C pair for 80% of all humans, but an A-T pair for the remaining 20%. This difference is a SNP.
How frequently are SNPs found? All humans are 99.9% identical at the DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. level. This means that on average, at a randomly selected locusplugin-autotooltip__default plugin-autotooltip_bigLocus (plural form: loci): a physical location of a gene; often used as a synonym for a gene., two randomly selected human allelesplugin-autotooltip__default plugin-autotooltip_bigAllele: a version of a gene. Alleles of a gene are different if they have differences in their DNA sequence. will differ at a frequency of 0.001. This implies that your maternal genomeplugin-autotooltip__default plugin-autotooltip_bigGenome: a dataset that contains all DNA information of an organism. Most of the time, this also includes annotation and curation of that information, e.g., the names, locations, and functions of genes within the genome. As an adjective (“genomic”), this usually is used in the context of (the haploidplugin-autotooltip__default plugin-autotooltip_bigHaploid: a term that describes a cell or organism that has only one copy of genetic information. Haploid cells typically arise from meiosis (or mitosis of a haploid mother cell). genomeplugin-autotooltip__default plugin-autotooltip_bigGenome: a dataset that contains all DNA information of an organism. Most of the time, this also includes annotation and curation of that information, e.g., the names, locations, and functions of genes within the genome. As an adjective (“genomic”), this usually is used in the context of that you inherited from your mother) differs from your paternal genomeplugin-autotooltip__default plugin-autotooltip_bigGenome: a dataset that contains all DNA information of an organism. Most of the time, this also includes annotation and curation of that information, e.g., the names, locations, and functions of genes within the genome. As an adjective (“genomic”), this usually is used in the context of (inherited from your father) at about 1 bp per 1000. Since the human genomeplugin-autotooltip__default plugin-autotooltip_bigGenome: a dataset that contains all DNA information of an organism. Most of the time, this also includes annotation and curation of that information, e.g., the names, locations, and functions of genes within the genome. As an adjective (“genomic”), this usually is used in the context of is 3×109 bp long, this means that two randomly selected individuals from a human population will differ at several million lociplugin-autotooltip__default plugin-autotooltip_bigLocus (plural form: loci): a physical location of a gene; often used as a synonym for a gene..
The vast majority (probably 99%) of SNPs are selectively “neutral” changes of little or no functional consequence. This is mostly because they likely exist outside coding or geneplugin-autotooltip__default plugin-autotooltip_bigGene: read Chapters 02, 03, 04, 05, and 06 for a definition of gene :-) regulatory regions (>97% of human genomeplugin-autotooltip__default plugin-autotooltip_bigGenome: a dataset that contains all DNA information of an organism. Most of the time, this also includes annotation and curation of that information, e.g., the names, locations, and functions of genes within the genome. As an adjective (“genomic”), this usually is used in the context of). They can also be silent substitutions in coding sequencesplugin-autotooltip__default plugin-autotooltip_bigSequence: the precise order of monomers in a polymer. In DNA, it refers to the order of G, A, T, and C nucleotides. In RNA, it refers to the order of G, A, U, and C nucleotides. In proteins, it refers to the order of amino acids., or amino acidplugin-autotooltip__default plugin-autotooltip_bigAmino acid: molecules that are polymerized to form proteins. substitutions that do not affect proteinplugin-autotooltip__default plugin-autotooltip_bigProtein: a molecule that is formed by the translation of messenger RNAs (mRNAs). Functions that proteins provide are what usually give organisms their phenotypes. stability or function. A small minority of SNPs are of functional consequence and are selectively advantageous or disadvantageous; this can affect the alleleplugin-autotooltip__default plugin-autotooltip_bigAllele: a version of a gene. Alleles of a gene are different if they have differences in their DNA sequence. frequency of these SNPs.
SNPs can be detected in a variety of ways. Conceptually, all you need to detect a SNP is knowledge of the DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. sequenceplugin-autotooltip__default plugin-autotooltip_bigSequence: the precise order of monomers in a polymer. In DNA, it refers to the order of G, A, T, and C nucleotides. In RNA, it refers to the order of G, A, U, and C nucleotides. In proteins, it refers to the order of amino acids. surrounding the SNP. Armed with this information, you could use PCRplugin-autotooltip__default plugin-autotooltip_bigPolymerase chain reaction (PCR): An experimental technique invented by Kary Mullis used to exponentially amplify DNA in vitro. PCR made obtaining large quantities of DNA for analysis much faster and easier than using traditional cloning methods. to amplify a DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. fragment that contains the SNP from individuals and sequenceplugin-autotooltip__default plugin-autotooltip_bigSequence: the precise order of monomers in a polymer. In DNA, it refers to the order of G, A, T, and C nucleotides. In RNA, it refers to the order of G, A, U, and C nucleotides. In proteins, it refers to the order of amino acids. the DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. (chap. 7). In practice, you would likely use a device called a DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. microarray that is specifically designed to detect SNPs. Or, you could simply use NGSplugin-autotooltip__default plugin-autotooltip_bigNext generation sequencing (NGS): DNA sequencing technologies that were developed after Sanger sequencing and are much more higher throughput than Sanger sequencing. Includes methods such as Illumina sequencing, 454 pyrosequencing, Nanopore sequencing, etc. sequencingplugin-autotooltip__default plugin-autotooltip_bigSequencing: the procedure used to determine the sequence of a biological polymer such as DNA, RNA, or protein. Although there are indeed biochemical techniques that can be used to directly sequence RNA or protein, these methods are almost never used in modern molecular genetics research - instead, RNA technology to sequenceplugin-autotooltip__default plugin-autotooltip_bigSequence: the precise order of monomers in a polymer. In DNA, it refers to the order of G, A, T, and C nucleotides. In RNA, it refers to the order of G, A, U, and C nucleotides. In proteins, it refers to the order of amino acids. the genomeplugin-autotooltip__default plugin-autotooltip_bigGenome: a dataset that contains all DNA information of an organism. Most of the time, this also includes annotation and curation of that information, e.g., the names, locations, and functions of genes within the genome. As an adjective (“genomic”), this usually is used in the context of of the individual to identify SNPs.
Simple sequence repeats (SSRs)
Simple sequenceplugin-autotooltip__default plugin-autotooltip_bigSequence: the precise order of monomers in a polymer. In DNA, it refers to the order of G, A, T, and C nucleotides. In RNA, it refers to the order of G, A, U, and C nucleotides. In proteins, it refers to the order of amino acids. repeats (SSRs) also go by a variety of other names: microsatellites, short tandem repeats (STRs), simple sequenceplugin-autotooltip__default plugin-autotooltip_bigSequence: the precise order of monomers in a polymer. In DNA, it refers to the order of G, A, T, and C nucleotides. In RNA, it refers to the order of G, A, U, and C nucleotides. In proteins, it refers to the order of amino acids. length polymorphisms (SSLPs), or variable number of tandem repeats (VNTRs). These are lociplugin-autotooltip__default plugin-autotooltip_bigLocus (plural form: loci): a physical location of a gene; often used as a synonym for a gene. where typically a dinucleotide sequenceplugin-autotooltip__default plugin-autotooltip_bigSequence: the precise order of monomers in a polymer. In DNA, it refers to the order of G, A, T, and C nucleotides. In RNA, it refers to the order of G, A, U, and C nucleotides. In proteins, it refers to the order of amino acids., such as CA or CG, is repeated with different repeat numbers in different allelesplugin-autotooltip__default plugin-autotooltip_bigAllele: a version of a gene. Alleles of a gene are different if they have differences in their DNA sequence.. In mammals, the most common type of SSR is CA repeats (MAKE A FIGURE). For example, a particular SSR lociplugin-autotooltip__default plugin-autotooltip_bigLocus (plural form: loci): a physical location of a gene; often used as a synonym for a gene. might have six different allelesplugin-autotooltip__default plugin-autotooltip_bigAllele: a version of a gene. Alleles of a gene are different if they have differences in their DNA sequence., each with a different number of CA repeats. The human genomeplugin-autotooltip__default plugin-autotooltip_bigGenome: a dataset that contains all DNA information of an organism. Most of the time, this also includes annotation and curation of that information, e.g., the names, locations, and functions of genes within the genome. As an adjective (“genomic”), this usually is used in the context of contains on the order of 50.000 - 100,000 dinucleotide SSRs. While SSRs are about 10x less dense than SNPs, they are usually much easier to detect.
SSRs in noncoding regions typically do not affect geneplugin-autotooltip__default plugin-autotooltip_bigGene: read Chapters 02, 03, 04, 05, and 06 for a definition of gene :-) function, and therefore are usually not under any kind of selectionplugin-autotooltip__default plugin-autotooltip_bigSelection: There are two distinct but somewhat related definitions for this term:
In model organism research, a selection is a process through which a researcher is attempting to find rare individuals with certain phenotypes and has some way of enriching for the rare individuals by killing off all other individuals that do not match the search criteria. Contrast to a. This means that SSR lociplugin-autotooltip__default plugin-autotooltip_bigLocus (plural form: loci): a physical location of a gene; often used as a synonym for a gene. can accumulate mutationsplugin-autotooltip__default plugin-autotooltip_bigMutation: a change in the DNA of a gene that results in a change of phenotype compared to a reference wildtype allele. See also: mutant. which leads to different allelesplugin-autotooltip__default plugin-autotooltip_bigAllele: a version of a gene. Alleles of a gene are different if they have differences in their DNA sequence. in a population. The different allelesplugin-autotooltip__default plugin-autotooltip_bigAllele: a version of a gene. Alleles of a gene are different if they have differences in their DNA sequence. will vary in length, and this difference in length can be detected by PCRplugin-autotooltip__default plugin-autotooltip_bigPolymerase chain reaction (PCR): An experimental technique invented by Kary Mullis used to exponentially amplify DNA in vitro. PCR made obtaining large quantities of DNA for analysis much faster and easier than using traditional cloning methods. (Chapter 07). To distinguish between different SSR allelesplugin-autotooltip__default plugin-autotooltip_bigAllele: a version of a gene. Alleles of a gene are different if they have differences in their DNA sequence., a researcher would use primersplugin-autotooltip__default plugin-autotooltip_bigPrimer: (1) A short (usually 15-30 nt long, depending on application) piece of ssDNA that is synthesized in vitro and used in applications such as DNA sequencing and PCR. In this context, primers can also be called oligonucleotides. (2) a free 3'-OH end on a nucleic acid that can be used for initiating that flank the SSR; the length of the PCRplugin-autotooltip__default plugin-autotooltip_bigPolymerase chain reaction (PCR): An experimental technique invented by Kary Mullis used to exponentially amplify DNA in vitro. PCR made obtaining large quantities of DNA for analysis much faster and easier than using traditional cloning methods. amplification product can then be determined by electrophoresisplugin-autotooltip__default plugin-autotooltip_bigElectrophoresis: an experimental technique that is used to separate charged molecules (such as nucleic acids) by size by applying an electric field to cause the charged molecules to move through a sheet of a gel-like material., which then defines a specific alleleplugin-autotooltip__default plugin-autotooltip_bigAllele: a version of a gene. Alleles of a gene are different if they have differences in their DNA sequence..

SSRs can be used as markersplugin-autotooltip__default plugin-autotooltip_bigMarker: an allele of a gene that provides an easily observable phenotype. Markers are usually cloned or least well mapped. They are used as genetic landmarks in various genetic experiments. In some cases, markers do not have easily observable phenotypes and can only be detected using molecular methods (e.g., SNPs or SSRs). for any kind of mappingplugin-autotooltip__default plugin-autotooltip_bigGenetic mapping: a term describing a variety of different experimental approaches used to determine the physical locations of genes on chromosomes. of human genesplugin-autotooltip__default plugin-autotooltip_bigGene: read Chapters 02, 03, 04, 05, and 06 for a definition of gene :-), but they are commonly used in forensics. The Federal Bureau of Investigation (FBI) maintains a DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. database called the Combined DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. Index System (CODIS) that contains data on a core set of SSR allelesplugin-autotooltip__default plugin-autotooltip_bigAllele: a version of a gene. Alleles of a gene are different if they have differences in their DNA sequence. from convicted offenders or arrestees of various crimes. Prior to 2017, 13 STR lociplugin-autotooltip__default plugin-autotooltip_bigLocus (plural form: loci): a physical location of a gene; often used as a synonym for a gene. were used in CODIS entries; since 2017, an additional 7 STR lociplugin-autotooltip__default plugin-autotooltip_bigLocus (plural form: loci): a physical location of a gene; often used as a synonym for a gene. have been added. These lociplugin-autotooltip__default plugin-autotooltip_bigLocus (plural form: loci): a physical location of a gene; often used as a synonym for a gene. are chosen such that they are unlinkedplugin-autotooltip__default plugin-autotooltip_bigLinkage: two loci are linked to each other if they are less than 50 m.u. apart. Two loci are unlinked if they are either (1) greater than 50 m.u. apart on the same chromosome, or; (2) are on separate chromosomes. from each other. This maximizes their utility in identifying unique individuals. This technique of using STR alleleplugin-autotooltip__default plugin-autotooltip_bigAllele: a version of a gene. Alleles of a gene are different if they have differences in their DNA sequence. combinations to identify individuals is called DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. fingerprinting.
A consequence of SSR lociplugin-autotooltip__default plugin-autotooltip_bigLocus (plural form: loci): a physical location of a gene; often used as a synonym for a gene. being neutral and not under selectionplugin-autotooltip__default plugin-autotooltip_bigSelection: There are two distinct but somewhat related definitions for this term:
In model organism research, a selection is a process through which a researcher is attempting to find rare individuals with certain phenotypes and has some way of enriching for the rare individuals by killing off all other individuals that do not match the search criteria. Contrast to a is that these lociplugin-autotooltip__default plugin-autotooltip_bigLocus (plural form: loci): a physical location of a gene; often used as a synonym for a gene. are usually in Hardy-Weinberg equilibrium (Chapter 18). This allows forensic scientists to use the principles of population genetics to calculate alleleplugin-autotooltip__default plugin-autotooltip_bigAllele: a version of a gene. Alleles of a gene are different if they have differences in their DNA sequence. frequencies. When the DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. of a suspect matches forensic evidence at a crime scene, alleleplugin-autotooltip__default plugin-autotooltip_bigAllele: a version of a gene. Alleles of a gene are different if they have differences in their DNA sequence. frequency (together with information on SSR lociplugin-autotooltip__default plugin-autotooltip_bigLocus (plural form: loci): a physical location of a gene; often used as a synonym for a gene. mutationplugin-autotooltip__default plugin-autotooltip_bigMutation: a change in the DNA of a gene that results in a change of phenotype compared to a reference wildtype allele. See also: mutant. rate) allows forensic scientists to calculate the likelihood that the combination of SSR allelesplugin-autotooltip__default plugin-autotooltip_bigAllele: a version of a gene. Alleles of a gene are different if they have differences in their DNA sequence. found in evidence matches that of the suspect is due to random chance. For instance, let's say there 11 different allelesplugin-autotooltip__default plugin-autotooltip_bigAllele: a version of a gene. Alleles of a gene are different if they have differences in their DNA sequence. at an SSR locusplugin-autotooltip__default plugin-autotooltip_bigLocus (plural form: loci): a physical location of a gene; often used as a synonym for a gene.. Let's say that alleleplugin-autotooltip__default plugin-autotooltip_bigAllele: a version of a gene. Alleles of a gene are different if they have differences in their DNA sequence. 1 has a frequency of 0.5 and alleleplugin-autotooltip__default plugin-autotooltip_bigAllele: a version of a gene. Alleles of a gene are different if they have differences in their DNA sequence. 2 has a frequency of 0.2; the remaining allelesplugin-autotooltip__default plugin-autotooltip_bigAllele: a version of a gene. Alleles of a gene are different if they have differences in their DNA sequence. are more rare and make up the remaining 0.3. The likelihood of a random individual in the population being heterozygousplugin-autotooltip__default plugin-autotooltip_bigHeterozygous: a state for a diploid organism wherein the two alleles for a gene are different from each other. at this locusplugin-autotooltip__default plugin-autotooltip_bigLocus (plural form: loci): a physical location of a gene; often used as a synonym for a gene. for alleleplugin-autotooltip__default plugin-autotooltip_bigAllele: a version of a gene. Alleles of a gene are different if they have differences in their DNA sequence. 1 and 2 is $0.5 \times 0.2 = 0.1$ or 10. Let's just guesstimate that the likelihood for a random match for most SSR lociplugin-autotooltip__default plugin-autotooltip_bigLocus (plural form: loci): a physical location of a gene; often used as a synonym for a gene. is also about 10% (or 0.1). If a forensic investigator compares 13 different lociplugin-autotooltip__default plugin-autotooltip_bigLocus (plural form: loci): a physical location of a gene; often used as a synonym for a gene. and gets a perfect match to a suspect, the likelihood that this match is due to random chance is $0.1^{13}$, or one in a trillion! This is why DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. fingerprinting is such a powerful method for law enforcement.
Example of using polymorphisms to map a human mutation: hypolactasia
The digestion of lactose requires the enzymeplugin-autotooltip__default plugin-autotooltip_bigEnzyme: a macromolecule, usually a protein (but sometimes an RNA), that functions as a catalyst of some kind of biochemical reaction. lactase-phlorizin hydrolase (LPH), which is produced by cells in the intestine and is encoded by the $LCT$ geneplugin-autotooltip__default plugin-autotooltip_bigGene: read Chapters 02, 03, 04, 05, and 06 for a definition of gene :-). When the intestines lack LPH, undigested lactose will ferment in the gut, leading to diarrhea. Adult-type hypolactasia is a genetic condition where individuals are able to digest lactose as children but lose the ability as they get older. This condition is also known as lactase non-persistence, or lactose intolerance. It is estimated that 68% of the world population is lactose intolerant. There are medical tests that can be done to determine if you are lactose intolerant, but they are inaccurate. Knowing the underlying genetic reasons for why some people have adult-type hypolactasia might give doctors the ability to administer an more accurate genetic test to determine whether they are lactose intolerant.
In 2002, a team of Finnish scientists set out to use human genetics methods to identify mutationsplugin-autotooltip__default plugin-autotooltip_bigMutation: a change in the DNA of a gene that results in a change of phenotype compared to a reference wildtype allele. See also: mutant. that are associated with hypolactasia. They reasoned that mutationsplugin-autotooltip__default plugin-autotooltip_bigMutation: a change in the DNA of a gene that results in a change of phenotype compared to a reference wildtype allele. See also: mutant. that affect individuals are probably not in the proteinplugin-autotooltip__default plugin-autotooltip_bigProtein: a molecule that is formed by the translation of messenger RNAs (mRNAs). Functions that proteins provide are what usually give organisms their phenotypes.-coding region of $LCT$, since these individuals could digest lactose as children (they also knew from other studies that there were no mutationsplugin-autotooltip__default plugin-autotooltip_bigMutation: a change in the DNA of a gene that results in a change of phenotype compared to a reference wildtype allele. See also: mutant. in the $LCT$ geneplugin-autotooltip__default plugin-autotooltip_bigGene: read Chapters 02, 03, 04, 05, and 06 for a definition of gene :-) of individuals that had hypolactasia). Instead, they believed that there may be mutationsplugin-autotooltip__default plugin-autotooltip_bigMutation: a change in the DNA of a gene that results in a change of phenotype compared to a reference wildtype allele. See also: mutant. in nearby cisplugin-autotooltip__default plugin-autotooltip_bigCis and trans: In genetics, cis and trans are terms used to describe the relative physical locations of genes or genetic elements. If two genes are in cis, this means that they are physically located on the same DNA molecule. If two genes are in trans, this means that they are physically located on two different-acting regulatory sequencesplugin-autotooltip__default plugin-autotooltip_bigSequence: the precise order of monomers in a polymer. In DNA, it refers to the order of G, A, T, and C nucleotides. In RNA, it refers to the order of G, A, U, and C nucleotides. In proteins, it refers to the order of amino acids. (see Chap. 13) that control the expressionplugin-autotooltip__default plugin-autotooltip_bigExpression: a term used to describe the idea that the function of a gene is apparent and can be observed. Genes may not always be expressed all the time in all places. of $LCT$, such that it is no longer expressedplugin-autotooltip__default plugin-autotooltip_bigExpression: a term used to describe the idea that the function of a gene is apparent and can be observed. Genes may not always be expressed all the time in all places. in adults (they also had some other evidence to support this idea).
The scientists examined the pedigrees of nine Finnish families with a history of hypolactasia (Fig xxx: NOTE: WAITING FOR PERMISSION FROM HHMI TO USE THE FIGURES). From the pedigrees, you can see that the inheritance pattern is consistent with hypolactasia being an autosomalplugin-autotooltip__default plugin-autotooltip_bigAutosome: any chromosome that is not a sex chromosome. recessiveplugin-autotooltip__default plugin-autotooltip_bigRecessive: used to describe an allele, usually in comparison to wildtype. Recessive alleles do not exhibit their phenotype when combined with a wildtype allele. mutationplugin-autotooltip__default plugin-autotooltip_bigMutation: a change in the DNA of a gene that results in a change of phenotype compared to a reference wildtype allele. See also: mutant.. The scientists then collected DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. samples from volunteers in these families and analyzed various polymorphisms. They utilized seven SSRs that flanked the $LCT$ geneplugin-autotooltip__default plugin-autotooltip_bigGene: read Chapters 02, 03, 04, 05, and 06 for a definition of gene :-) on either side. They found strong statistical evidence (see Chap. 22) for linkageplugin-autotooltip__default plugin-autotooltip_bigLinkage: two loci are linked to each other if they are less than 50 m.u. apart. Two loci are unlinked if they are either (1) greater than 50 m.u. apart on the same chromosome, or; (2) are on separate chromosomes. of hypolactasia to an SSR upstreamplugin-autotooltip__default plugin-autotooltip_bigUpstream/downstream: These descriptors have different meanings depending on context:
* In genetics, these are terms used to describe directions on DNA, usually relative to the transcription start site of a gene. DNA sequences that are located in the same direction as the direction of of the $LCT$ geneplugin-autotooltip__default plugin-autotooltip_bigGene: read Chapters 02, 03, 04, 05, and 06 for a definition of gene :-) - consistent with it being a regulatory mutantplugin-autotooltip__default plugin-autotooltip_bigMutant: an individual that has a different phenotype than wildtype and likely contains one more mutations that cause this difference. instead of a coding mutantplugin-autotooltip__default plugin-autotooltip_bigMutant: an individual that has a different phenotype than wildtype and likely contains one more mutations that cause this difference.. They then used SNPs to further narrow down the genetic interval for the mutationsplugin-autotooltip__default plugin-autotooltip_bigMutation: a change in the DNA of a gene that results in a change of phenotype compared to a reference wildtype allele. See also: mutant. to a 47 kb region of DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. upstreamplugin-autotooltip__default plugin-autotooltip_bigUpstream/downstream: These descriptors have different meanings depending on context:
* In genetics, these are terms used to describe directions on DNA, usually relative to the transcription start site of a gene. DNA sequences that are located in the same direction as the direction of of $LCT$. In this region, they found many DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. polymorphisms among the family members, but only two SNPs that showed complete co-segregation with the hypolactasia trait. Subsequent reverse geneticplugin-autotooltip__default plugin-autotooltip_bigReverse genetics: an approach to studying genes wherein a researcher starts with knowledge of the physical identity of a gene (i.e., the DNA sequence of the gene) but does not know its function. In reverse genetics, the researcher uses various molecular genetic tools to create modified alleles that are reintroduced into an organism, with the goal of trying to deduce the function of the studies in mice and cultured human cells suggest that these two SNPs may be mutationsplugin-autotooltip__default plugin-autotooltip_bigMutation: a change in the DNA of a gene that results in a change of phenotype compared to a reference wildtype allele. See also: mutant. that affect the ability of a transcription factorplugin-autotooltip__default plugin-autotooltip_bigTranscription factor: a generic term that describes any DNA binding protein that affects transcription (can havea positive or negative effect on transcription). called OCT-1 to bind, thereby affecting the expressionplugin-autotooltip__default plugin-autotooltip_bigExpression: a term used to describe the idea that the function of a gene is apparent and can be observed. Genes may not always be expressed all the time in all places. of $LCT$ in adults.