User Tools

Site Tools


chapter_07

Chapter 07. Analysis of gene sequences

Although eukaryoticplugin-autotooltip__default plugin-autotooltip_bigeukaryote: organism whose cells have membrane bound organelles, including the nucleus. genesplugin-autotooltip__default plugin-autotooltip_bigGene: read Chapters 02, 03, 04, 05, and 06 for a definition of gene :-) may be generally more interesting to most students, it is useful to first consider bacterialplugin-autotooltip__default plugin-autotooltip_bigBacteria: Single-celled organisms that also utilize DNA and the standard genetic code as all organisms on earth, but unlike eukaryotes do not have intracellular membranes and membrane-bound organelles. In this book we use bacteria and prokaryote interchangeably. genesplugin-autotooltip__default plugin-autotooltip_bigGene: read Chapters 02, 03, 04, 05, and 06 for a definition of gene :-). Most eukaryoticplugin-autotooltip__default plugin-autotooltip_bigeukaryote: organism whose cells have membrane bound organelles, including the nucleus. molecular biologists use bacteriaplugin-autotooltip__default plugin-autotooltip_bigBacteria: Single-celled organisms that also utilize DNA and the standard genetic code as all organisms on earth, but unlike eukaryotes do not have intracellular membranes and membrane-bound organelles. In this book we use bacteria and prokaryote interchangeably. as tools for various things (e.g., molecular cloningplugin-autotooltip__default plugin-autotooltip_bigClone: Depending on the context, this word can have a few different meanings:

* In the context of genes, cloning means that the physical identity of a gene has been found, and the gene has been sequenced. * In the context of DNA, a cloned DNA fragment is one that has been inserted into some kind of
; see Chapter 09), so it’s useful to understand how bacteriaplugin-autotooltip__default plugin-autotooltip_bigBacteria: Single-celled organisms that also utilize DNA and the standard genetic code as all organisms on earth, but unlike eukaryotes do not have intracellular membranes and membrane-bound organelles. In this book we use bacteria and prokaryote interchangeably. work from a practical perspective. Also, although bacterialplugin-autotooltip__default plugin-autotooltip_bigBacteria: Single-celled organisms that also utilize DNA and the standard genetic code as all organisms on earth, but unlike eukaryotes do not have intracellular membranes and membrane-bound organelles. In this book we use bacteria and prokaryote interchangeably. genesplugin-autotooltip__default plugin-autotooltip_bigGene: read Chapters 02, 03, 04, 05, and 06 for a definition of gene :-) have some pretty important differences compared to eukaryoticplugin-autotooltip__default plugin-autotooltip_bigeukaryote: organism whose cells have membrane bound organelles, including the nucleus. genesplugin-autotooltip__default plugin-autotooltip_bigGene: read Chapters 02, 03, 04, 05, and 06 for a definition of gene :-), many basic principles are the same.

Bacterial gene structure

This is what a typical bacterialplugin-autotooltip__default plugin-autotooltip_bigBacteria: Single-celled organisms that also utilize DNA and the standard genetic code as all organisms on earth, but unlike eukaryotes do not have intracellular membranes and membrane-bound organelles. In this book we use bacteria and prokaryote interchangeably. geneplugin-autotooltip__default plugin-autotooltip_bigGene: read Chapters 02, 03, 04, 05, and 06 for a definition of gene :-) might look like schematically.

Figure 1: Figure 7.1. Anatomy of a bacterialplugin-autotooltip__default plugin-autotooltip_bigBacteria: Single-celled organisms that also utilize DNA and the standard genetic code as all organisms on earth, but unlike eukaryotes do not have intracellular membranes and membrane-bound organelles. In this book we use bacteria and prokaryote interchangeably. geneplugin-autotooltip__default plugin-autotooltip_bigGene: read Chapters 02, 03, 04, 05, and 06 for a definition of gene :-). See Table 1 for descriptions of different parts of this geneplugin-autotooltip__default plugin-autotooltip_bigGene: read Chapters 02, 03, 04, 05, and 06 for a definition of gene :-). The straight horizontal lineplugin-autotooltip__default plugin-autotooltip_bigStrain or line: refers to a pool or colony of individuals or cultured cells of a desired genotype or phenotype that is mostly homogeneous and can be bred and/or produced in perpetuity for research or commercial purposes. “Strain” tends to be used more for microorganisms and represents double-stranded DNAplugin-autotooltip__default plugin-autotooltip_bigDouble-stranded DNA (dsDNA): DNA that consists of two complementary strands of ssDNA paired together via hydrogen bonds between the nitrogenous bases G, A, T, and C., whereas the wavy lineplugin-autotooltip__default plugin-autotooltip_bigStrain or line: refers to a pool or colony of individuals or cultured cells of a desired genotype or phenotype that is mostly homogeneous and can be bred and/or produced in perpetuity for research or commercial purposes. “Strain” tends to be used more for microorganisms and represents mRNAplugin-autotooltip__default plugin-autotooltip_bigmessenger RNA (mRNA): an RNA molecule that codes for protein.. Note that the mRNAplugin-autotooltip__default plugin-autotooltip_bigmessenger RNA (mRNA): an RNA molecule that codes for protein. is slightly longer than the coding sequenceplugin-autotooltip__default plugin-autotooltip_bigCoding sequence: refers to the portion of DNA or mRNA in a gene that contains direct information on the gene product. In most cases, this means a portion of DNA or mRNA that correlates to codons. Note that not all parts of a gene will necessarily be coding sequence (e.g., intron sequences)..
DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. or RNA element Function
Promoterplugin-autotooltip__default plugin-autotooltip_bigPromoter: has multiple closely related but subtly different meanings depending on context:

* In bacteria, a promoter is a cis-acting DNA sequence near the transcription start site of a gene or operon that binds to bacterial RNA polymerase. * In eukaryotes, the formal definition of a promoter (also called a basal promoter) is a RNA
To target RNA polymeraseplugin-autotooltip__default plugin-autotooltip_bigRNA polymerase: the enzyme that carries out RNA transcription. There are many different types of RNA polymerase, but in this book we collectively refer to them as just “RNA polymerase” for simplicity. to DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. and to start transcriptionplugin-autotooltip__default plugin-autotooltip_bigRNA transcription: the process of RNA polymerase using the DNA sequence of a gene as a template to form an mRNA (in prokaryotes) or pre-mRNA (in eukaryotes). In most cases, “transcription” implies RNA transcription. of a mRNAplugin-autotooltip__default plugin-autotooltip_bigmessenger RNA (mRNA): an RNA molecule that codes for protein. copy of the geneplugin-autotooltip__default plugin-autotooltip_bigGene: read Chapters 02, 03, 04, 05, and 06 for a definition of gene :-) sequenceplugin-autotooltip__default plugin-autotooltip_bigSequence: the precise order of monomers in a polymer. In DNA, it refers to the order of G, A, T, and C nucleotides. In RNA, it refers to the order of G, A, U, and C nucleotides. In proteins, it refers to the order of amino acids..
Transcriptionplugin-autotooltip__default plugin-autotooltip_bigRNA transcription: the process of RNA polymerase using the DNA sequence of a gene as a template to form an mRNA (in prokaryotes) or pre-mRNA (in eukaryotes). In most cases, “transcription” implies RNA transcription. stop (terminator) To instruct RNA polymeraseplugin-autotooltip__default plugin-autotooltip_bigRNA polymerase: the enzyme that carries out RNA transcription. There are many different types of RNA polymerase, but in this book we collectively refer to them as just “RNA polymerase” for simplicity. to stop transcriptionplugin-autotooltip__default plugin-autotooltip_bigRNA transcription: the process of RNA polymerase using the DNA sequence of a gene as a template to form an mRNA (in prokaryotes) or pre-mRNA (in eukaryotes). In most cases, “transcription” implies RNA transcription..
Shine-Dalgarno sequenceplugin-autotooltip__default plugin-autotooltip_bigSequence: the precise order of monomers in a polymer. In DNA, it refers to the order of G, A, T, and C nucleotides. In RNA, it refers to the order of G, A, U, and C nucleotides. In proteins, it refers to the order of amino acids. and start codonplugin-autotooltip__default plugin-autotooltip_bigStart codon: usually the first AUG RNA sequence starting from the 5' end of the mRNA and reading in the 3' direction. AUG signals the ribosome to start translation. AUG also codes for the amino acid methionine. Shine-Dalgarno sequenceplugin-autotooltip__default plugin-autotooltip_bigSequence: the precise order of monomers in a polymer. In DNA, it refers to the order of G, A, T, and C nucleotides. In RNA, it refers to the order of G, A, U, and C nucleotides. In proteins, it refers to the order of amino acids. in mRNAplugin-autotooltip__default plugin-autotooltip_bigmessenger RNA (mRNA): an RNA molecule that codes for protein. will load ribosomesplugin-autotooltip__default plugin-autotooltip_bigRibosome: a very large and complex enzyme formed from both protein and rRNA subunits. Its primary function is to catalyze translation. to begin translationplugin-autotooltip__default plugin-autotooltip_bigProtein translation: the process of using an mRNA as a template to synthesize a protein based on the genetic code, using ribosomes.. Translationplugin-autotooltip__default plugin-autotooltip_bigProtein translation: the process of using an mRNA as a template to synthesize a protein based on the genetic code, using ribosomes. almost always begins at an AUG codonplugin-autotooltip__default plugin-autotooltip_bigCodon: a three nucleotide sequence that is read by the ribosome and specifies an amino acid that is added to a growing poplypeptide chain based on the genetic code. in the mRNAplugin-autotooltip__default plugin-autotooltip_bigmessenger RNA (mRNA): an RNA molecule that codes for protein. (an ATG in the DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. becomes an AUG in the mRNAplugin-autotooltip__default plugin-autotooltip_bigmessenger RNA (mRNA): an RNA molecule that codes for protein. copy). Synthesis of the proteinplugin-autotooltip__default plugin-autotooltip_bigProtein: a molecule that is formed by the translation of messenger RNAs (mRNAs). Functions that proteins provide are what usually give organisms their phenotypes. thus begins with a methionine (see Fig. 6.4).
Coding sequenceplugin-autotooltip__default plugin-autotooltip_bigCoding sequence: refers to the portion of DNA or mRNA in a gene that contains direct information on the gene product. In most cases, this means a portion of DNA or mRNA that correlates to codons. Note that not all parts of a gene will necessarily be coding sequence (e.g., intron sequences). Once translationplugin-autotooltip__default plugin-autotooltip_bigProtein translation: the process of using an mRNA as a template to synthesize a protein based on the genetic code, using ribosomes. starts, the coding sequenceplugin-autotooltip__default plugin-autotooltip_bigCoding sequence: refers to the portion of DNA or mRNA in a gene that contains direct information on the gene product. In most cases, this means a portion of DNA or mRNA that correlates to codons. Note that not all parts of a gene will necessarily be coding sequence (e.g., intron sequences). is translated by the ribosomeplugin-autotooltip__default plugin-autotooltip_bigRibosome: a very large and complex enzyme formed from both protein and rRNA subunits. Its primary function is to catalyze translation. with the help of transfer RNAsplugin-autotooltip__default plugin-autotooltip_bigRNA sequencing (RNAseq): an experimental technique that sequences all the RNAs in a sample. It is based off of converting RNAs into cDNAs with reverse transcriptase, followed by Illumina sequencing. (tRNAs), which read three bases at a time in consecutive sequenceplugin-autotooltip__default plugin-autotooltip_bigSequence: the precise order of monomers in a polymer. In DNA, it refers to the order of G, A, T, and C nucleotides. In RNA, it refers to the order of G, A, U, and C nucleotides. In proteins, it refers to the order of amino acids. (one after the other with no overlap). Amino acidsplugin-autotooltip__default plugin-autotooltip_bigAmino acid: molecules that are polymerized to form proteins. will be incorporated into the growing polypeptide chain according to the genetic codeplugin-autotooltip__default plugin-autotooltip_bigGenetic code: the code that matches codons with specific amino acids. Since each codon is 3 nucleotides long (i.e., the genetic code is a triplet code) and there are 4 different RNA nucleotides (G, A, U, and C), the genetic code could in theory specify up to $4^3=64$ different amino acids. But since there are only 20 different (see Fig. 6.4).
Translationplugin-autotooltip__default plugin-autotooltip_bigProtein translation: the process of using an mRNA as a template to synthesize a protein based on the genetic code, using ribosomes. stop When one of the three stop codonsplugin-autotooltip__default plugin-autotooltip_bigStop codon: a codon that instructs the ribosome to terminate translation. Stop codons are UGA, UAG, or UAA in the universal genetic code. (UAG, UGA, or UAA; see genetic codeplugin-autotooltip__default plugin-autotooltip_bigGenetic code: the code that matches codons with specific amino acids. Since each codon is 3 nucleotides long (i.e., the genetic code is a triplet code) and there are 4 different RNA nucleotides (G, A, U, and C), the genetic code could in theory specify up to $4^3=64$ different amino acids. But since there are only 20 different in Fig. 6.4) is encountered during translationplugin-autotooltip__default plugin-autotooltip_bigProtein translation: the process of using an mRNA as a template to synthesize a protein based on the genetic code, using ribosomes., the polypeptide will be released from the ribosomeplugin-autotooltip__default plugin-autotooltip_bigRibosome: a very large and complex enzyme formed from both protein and rRNA subunits. Its primary function is to catalyze translation..

Table 1: Table 7.1. Bacterialplugin-autotooltip__default plugin-autotooltip_bigBacteria: Single-celled organisms that also utilize DNA and the standard genetic code as all organisms on earth, but unlike eukaryotes do not have intracellular membranes and membrane-bound organelles. In this book we use bacteria and prokaryote interchangeably. geneplugin-autotooltip__default plugin-autotooltip_bigGene: read Chapters 02, 03, 04, 05, and 06 for a definition of gene :-) parts list.

For example, a bacterialplugin-autotooltip__default plugin-autotooltip_bigBacteria: Single-celled organisms that also utilize DNA and the standard genetic code as all organisms on earth, but unlike eukaryotes do not have intracellular membranes and membrane-bound organelles. In this book we use bacteria and prokaryote interchangeably. geneplugin-autotooltip__default plugin-autotooltip_bigGene: read Chapters 02, 03, 04, 05, and 06 for a definition of gene :-) coding sequenceplugin-autotooltip__default plugin-autotooltip_bigCoding sequence: refers to the portion of DNA or mRNA in a gene that contains direct information on the gene product. In most cases, this means a portion of DNA or mRNA that correlates to codons. Note that not all parts of a gene will necessarily be coding sequence (e.g., intron sequences). that is 1,200 nucleotideplugin-autotooltip__default plugin-autotooltip_bigNucleotide: molecules that are polymerized to form nucleic acids (DNA or RNA). Includes dNTPs and NTPs. base pairsplugin-autotooltip__default plugin-autotooltip_bigBase pair: a term used to describe how nitrogenous bases (G, A, T/U, and C) in nucleic acids interact with each other via hydrogen bonds to form double-stranded molecules (including dsDNA, dsRNA, and DNA/RNA hybrids). G always pairs with C, and T/U always pairs with A. in length (including the ATG start codonplugin-autotooltip__default plugin-autotooltip_bigStart codon: usually the first AUG RNA sequence starting from the 5' end of the mRNA and reading in the 3' direction. AUG signals the ribosome to start translation. AUG also codes for the amino acid methionine. but not including the stop codonplugin-autotooltip__default plugin-autotooltip_bigStop codon: a codon that instructs the ribosome to terminate translation. Stop codons are UGA, UAG, or UAA in the universal genetic code.) will specify the sequenceplugin-autotooltip__default plugin-autotooltip_bigSequence: the precise order of monomers in a polymer. In DNA, it refers to the order of G, A, T, and C nucleotides. In RNA, it refers to the order of G, A, U, and C nucleotides. In proteins, it refers to the order of amino acids. of a proteinplugin-autotooltip__default plugin-autotooltip_bigProtein: a molecule that is formed by the translation of messenger RNAs (mRNAs). Functions that proteins provide are what usually give organisms their phenotypes. 1200/3 = 400 amino acidsplugin-autotooltip__default plugin-autotooltip_bigAmino acid: molecules that are polymerized to form proteins. long. Since the average molecular weight of an amino acidplugin-autotooltip__default plugin-autotooltip_bigAmino acid: molecules that are polymerized to form proteins. is 110 Daltons (1 Da = 1 g/mol), this geneplugin-autotooltip__default plugin-autotooltip_bigGene: read Chapters 02, 03, 04, 05, and 06 for a definition of gene :-) encodes a proteinplugin-autotooltip__default plugin-autotooltip_bigProtein: a molecule that is formed by the translation of messenger RNAs (mRNAs). Functions that proteins provide are what usually give organisms their phenotypes. of about 44 kDa - the size of a pretty average proteinplugin-autotooltip__default plugin-autotooltip_bigProtein: a molecule that is formed by the translation of messenger RNAs (mRNAs). Functions that proteins provide are what usually give organisms their phenotypes.. There are also important cis-acting regulatory elementsplugin-autotooltip__default plugin-autotooltip_bigCis-acting regulatory element: a DNA sequence that is usually located near and controls the expression of a gene or genes. Includes elements such as enhancers (e.g., $UAS_{GAL}$), repressors (e.g., $URS_{GAL}$), operators (e.g., $lacO$), promoters (e.g., the $GAL4$ promoter that contains the TATA box; another example is $lacP$ in E. coli), etc. that are outside of the coding sequenceplugin-autotooltip__default plugin-autotooltip_bigCoding sequence: refers to the portion of DNA or mRNA in a gene that contains direct information on the gene product. In most cases, this means a portion of DNA or mRNA that correlates to codons. Note that not all parts of a gene will necessarily be coding sequence (e.g., intron sequences). of a geneplugin-autotooltip__default plugin-autotooltip_bigGene: read Chapters 02, 03, 04, 05, and 06 for a definition of gene :-) but nonetheless are important for a geneplugin-autotooltip__default plugin-autotooltip_bigGene: read Chapters 02, 03, 04, 05, and 06 for a definition of gene :-)'s function (discussed further in Chapter 10).

Identifying a gene based on DNA sequence data

In earlier chapters we discussed how genesplugin-autotooltip__default plugin-autotooltip_bigGene: read Chapters 02, 03, 04, 05, and 06 for a definition of gene :-) are classically identified by their function. That is, the existence of the geneplugin-autotooltip__default plugin-autotooltip_bigGene: read Chapters 02, 03, 04, 05, and 06 for a definition of gene :-) is recognized because of mutationsplugin-autotooltip__default plugin-autotooltip_bigMutation: a change in the DNA of a gene that results in a change of phenotype compared to a reference wildtype allele. See also: mutant. in the geneplugin-autotooltip__default plugin-autotooltip_bigGene: read Chapters 02, 03, 04, 05, and 06 for a definition of gene :-) that give an observable phenotypicplugin-autotooltip__default plugin-autotooltip_bigPhenotype: an observable feature or property of an organism. change when compared to wildtypeplugin-autotooltip__default plugin-autotooltip_bigWildtype: a reference strain of an organism that scientists operationally define as “normal” to which mutants are compared. Not to be confused with wild organisms.. Historically, many genesplugin-autotooltip__default plugin-autotooltip_bigGene: read Chapters 02, 03, 04, 05, and 06 for a definition of gene :-) have been discovered because of mutationsplugin-autotooltip__default plugin-autotooltip_bigMutation: a change in the DNA of a gene that results in a change of phenotype compared to a reference wildtype allele. See also: mutant. and their effects on phenotypeplugin-autotooltip__default plugin-autotooltip_bigPhenotype: an observable feature or property of an organism.. Genesplugin-autotooltip__default plugin-autotooltip_bigGene: read Chapters 02, 03, 04, 05, and 06 for a definition of gene :-) were then mapped and/or cloned (and therefore physically identified) by complementationplugin-autotooltip__default plugin-autotooltip_bigComplementation: a concept where an additional allele of a gene (usually a wildtype allele) can provide normal function to an organism with a recessive loss of function mutation in that gene. The concept of complementation underlies the complementation test. (Chapter 09).

Now in the era of large-scale and inexpensive genomicplugin-autotooltip__default plugin-autotooltip_bigGenome: a dataset that contains all DNA information of an organism. Most of the time, this also includes annotation and curation of that information, e.g., the names, locations, and functions of genes within the genome. As an adjective (“genomic”), this usually is used in the context of sequencingplugin-autotooltip__default plugin-autotooltip_bigSequencing: the procedure used to determine the sequence of a biological polymer such as DNA, RNA, or protein. Although there are indeed biochemical techniques that can be used to directly sequence RNA or protein, these methods are almost never used in modern molecular genetics research - instead, RNA, many genesplugin-autotooltip__default plugin-autotooltip_bigGene: read Chapters 02, 03, 04, 05, and 06 for a definition of gene :-) of no known function can be detected by looking for patterns in DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. sequencesplugin-autotooltip__default plugin-autotooltip_bigSequence: the precise order of monomers in a polymer. In DNA, it refers to the order of G, A, T, and C nucleotides. In RNA, it refers to the order of G, A, U, and C nucleotides. In proteins, it refers to the order of amino acids.. The simplest method that works for bacterialplugin-autotooltip__default plugin-autotooltip_bigBacteria: Single-celled organisms that also utilize DNA and the standard genetic code as all organisms on earth, but unlike eukaryotes do not have intracellular membranes and membrane-bound organelles. In this book we use bacteria and prokaryote interchangeably. genesplugin-autotooltip__default plugin-autotooltip_bigGene: read Chapters 02, 03, 04, 05, and 06 for a definition of gene :-) (but not for most eukaryoticplugin-autotooltip__default plugin-autotooltip_bigeukaryote: organism whose cells have membrane bound organelles, including the nucleus. genesplugin-autotooltip__default plugin-autotooltip_bigGene: read Chapters 02, 03, 04, 05, and 06 for a definition of gene :-), as we will see later) is to look for stretches of DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. sequenceplugin-autotooltip__default plugin-autotooltip_bigSequence: the precise order of monomers in a polymer. In DNA, it refers to the order of G, A, T, and C nucleotides. In RNA, it refers to the order of G, A, U, and C nucleotides. In proteins, it refers to the order of amino acids. that lack stop codonsplugin-autotooltip__default plugin-autotooltip_bigStop codon: a codon that instructs the ribosome to terminate translation. Stop codons are UGA, UAG, or UAA in the universal genetic code.. These sequencesplugin-autotooltip__default plugin-autotooltip_bigSequence: the precise order of monomers in a polymer. In DNA, it refers to the order of G, A, T, and C nucleotides. In RNA, it refers to the order of G, A, U, and C nucleotides. In proteins, it refers to the order of amino acids. are known as open reading framesplugin-autotooltip__default plugin-autotooltip_bigOpen reading frame (ORF): an RNA sequence that begins with a start codon and continues for at least a defined distance (usually 100 codons, or 300 nucleotides long) without encountering an in-frame stop codon. or ORFsplugin-autotooltip__default plugin-autotooltip_bigOpen reading frame (ORF): an RNA sequence that begins with a start codon and continues for at least a defined distance (usually 100 codons, or 300 nucleotides long) without encountering an in-frame stop codon.. This method works because a truly random sequenceplugin-autotooltip__default plugin-autotooltip_bigSequence: the precise order of monomers in a polymer. In DNA, it refers to the order of G, A, T, and C nucleotides. In RNA, it refers to the order of G, A, U, and C nucleotides. In proteins, it refers to the order of amino acids. should contain an average of one stop codonplugin-autotooltip__default plugin-autotooltip_bigStop codon: a codon that instructs the ribosome to terminate translation. Stop codons are UGA, UAG, or UAA in the universal genetic code. in about every 21 codonsplugin-autotooltip__default plugin-autotooltip_bigCodon: a three nucleotide sequence that is read by the ribosome and specifies an amino acid that is added to a growing poplypeptide chain based on the genetic code. (i.e., there are a total of 64 possible codonsplugin-autotooltip__default plugin-autotooltip_bigCodon: a three nucleotide sequence that is read by the ribosome and specifies an amino acid that is added to a growing poplypeptide chain based on the genetic code. and there are 3 stop codonsplugin-autotooltip__default plugin-autotooltip_bigStop codon: a codon that instructs the ribosome to terminate translation. Stop codons are UGA, UAG, or UAA in the universal genetic code.). Thus, the probability of a random occurrence of even a short open reading frameplugin-autotooltip__default plugin-autotooltip_bigOpen reading frame (ORF): an RNA sequence that begins with a start codon and continues for at least a defined distance (usually 100 codons, or 300 nucleotides long) without encountering an in-frame stop codon. of (for instance) 100 codonsplugin-autotooltip__default plugin-autotooltip_bigCodon: a three nucleotide sequence that is read by the ribosome and specifies an amino acid that is added to a growing poplypeptide chain based on the genetic code. (or 300 base pairsplugin-autotooltip__default plugin-autotooltip_bigBase pair: a term used to describe how nitrogenous bases (G, A, T/U, and C) in nucleic acids interact with each other via hydrogen bonds to form double-stranded molecules (including dsDNA, dsRNA, and DNA/RNA hybrids). G always pairs with C, and T/U always pairs with A. (bp) of DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth.) without a stop codonplugin-autotooltip__default plugin-autotooltip_bigStop codon: a codon that instructs the ribosome to terminate translation. Stop codons are UGA, UAG, or UAA in the universal genetic code. is relatively small: $p = (\frac{61}{64})^{100} = 8.2 \times 10^{-3}$, or slightly better than a 1 in 10,000 chance. Since the genetic codeplugin-autotooltip__default plugin-autotooltip_bigGenetic code: the code that matches codons with specific amino acids. Since each codon is 3 nucleotides long (i.e., the genetic code is a triplet code) and there are 4 different RNA nucleotides (G, A, U, and C), the genetic code could in theory specify up to $4^3=64$ different amino acids. But since there are only 20 different is a triplet code, one would need to search through three possible reading frames for ORFsplugin-autotooltip__default plugin-autotooltip_bigOpen reading frame (ORF): an RNA sequence that begins with a start codon and continues for at least a defined distance (usually 100 codons, or 300 nucleotides long) without encountering an in-frame stop codon..

Identifying genesplugin-autotooltip__default plugin-autotooltip_bigGene: read Chapters 02, 03, 04, 05, and 06 for a definition of gene :-) in DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. sequencesplugin-autotooltip__default plugin-autotooltip_bigSequence: the precise order of monomers in a polymer. In DNA, it refers to the order of G, A, T, and C nucleotides. In RNA, it refers to the order of G, A, U, and C nucleotides. In proteins, it refers to the order of amino acids. from eukaryotesplugin-autotooltip__default plugin-autotooltip_bigeukaryote: organism whose cells have membrane bound organelles, including the nucleus. is usually more difficult than in bacteriaplugin-autotooltip__default plugin-autotooltip_bigBacteria: Single-celled organisms that also utilize DNA and the standard genetic code as all organisms on earth, but unlike eukaryotes do not have intracellular membranes and membrane-bound organelles. In this book we use bacteria and prokaryote interchangeably.. First, this is because geneplugin-autotooltip__default plugin-autotooltip_bigGene: read Chapters 02, 03, 04, 05, and 06 for a definition of gene :-) coding sequencesplugin-autotooltip__default plugin-autotooltip_bigSequence: the precise order of monomers in a polymer. In DNA, it refers to the order of G, A, T, and C nucleotides. In RNA, it refers to the order of G, A, U, and C nucleotides. In proteins, it refers to the order of amino acids. in eukaryotesplugin-autotooltip__default plugin-autotooltip_bigeukaryote: organism whose cells have membrane bound organelles, including the nucleus. are separated by long sequencesplugin-autotooltip__default plugin-autotooltip_bigSequence: the precise order of monomers in a polymer. In DNA, it refers to the order of G, A, T, and C nucleotides. In RNA, it refers to the order of G, A, U, and C nucleotides. In proteins, it refers to the order of amino acids. that do not code for proteinsplugin-autotooltip__default plugin-autotooltip_bigProtein: a molecule that is formed by the translation of messenger RNAs (mRNAs). Functions that proteins provide are what usually give organisms their phenotypes. called non-coding intergenic regionsplugin-autotooltip__default plugin-autotooltip_bigIntergenic region: a region of DNA that sits between genes and does not contain other genes or coding sequences. Integenic regions usually contain various cis-acting regulatory elements that control the expression of nearby genes.. In other words, eukaryoticplugin-autotooltip__default plugin-autotooltip_bigeukaryote: organism whose cells have membrane bound organelles, including the nucleus. genomesplugin-autotooltip__default plugin-autotooltip_bigGenome: a dataset that contains all DNA information of an organism. Most of the time, this also includes annotation and curation of that information, e.g., the names, locations, and functions of genes within the genome. As an adjective (“genomic”), this usually is used in the context of are usually less geneplugin-autotooltip__default plugin-autotooltip_bigGene: read Chapters 02, 03, 04, 05, and 06 for a definition of gene :-)-dense than prokaryotesplugin-autotooltip__default plugin-autotooltip_bigProkaryote: an organism that does not have membrane bound organelles. In this book prokaryotes refer to bacteria., so there is a lot more DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. sequenceplugin-autotooltip__default plugin-autotooltip_bigSequence: the precise order of monomers in a polymer. In DNA, it refers to the order of G, A, T, and C nucleotides. In RNA, it refers to the order of G, A, U, and C nucleotides. In proteins, it refers to the order of amino acids. to search through. Moreover, the coding sequencesplugin-autotooltip__default plugin-autotooltip_bigSequence: the precise order of monomers in a polymer. In DNA, it refers to the order of G, A, T, and C nucleotides. In RNA, it refers to the order of G, A, U, and C nucleotides. In proteins, it refers to the order of amino acids. of most eukaryoteplugin-autotooltip__default plugin-autotooltip_bigeukaryote: organism whose cells have membrane bound organelles, including the nucleus. genesplugin-autotooltip__default plugin-autotooltip_bigGene: read Chapters 02, 03, 04, 05, and 06 for a definition of gene :-) are interrupted by intronsplugin-autotooltip__default plugin-autotooltip_bigIntron: sequences in a pre-mRNA that sit between exons and are removed by splicing. Formally speaking, introns are defined as being part of pre-mRNA, but the DNA sequence that codes for introns can also be informally described as introns., which are sequencesplugin-autotooltip__default plugin-autotooltip_bigSequence: the precise order of monomers in a polymer. In DNA, it refers to the order of G, A, T, and C nucleotides. In RNA, it refers to the order of G, A, U, and C nucleotides. In proteins, it refers to the order of amino acids. that are removed (spliced) from the mRNAplugin-autotooltip__default plugin-autotooltip_bigmessenger RNA (mRNA): an RNA molecule that codes for protein. before translationplugin-autotooltip__default plugin-autotooltip_bigProtein translation: the process of using an mRNA as a template to synthesize a protein based on the genetic code, using ribosomes.. In other words, eukaryoticplugin-autotooltip__default plugin-autotooltip_bigeukaryote: organism whose cells have membrane bound organelles, including the nucleus. genesplugin-autotooltip__default plugin-autotooltip_bigGene: read Chapters 02, 03, 04, 05, and 06 for a definition of gene :-) at the DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. level are not co-linear with proteinsplugin-autotooltip__default plugin-autotooltip_bigProtein: a molecule that is formed by the translation of messenger RNAs (mRNAs). Functions that proteins provide are what usually give organisms their phenotypes.. The presence of intronsplugin-autotooltip__default plugin-autotooltip_bigIntron: sequences in a pre-mRNA that sit between exons and are removed by splicing. Formally speaking, introns are defined as being part of pre-mRNA, but the DNA sequence that codes for introns can also be informally described as introns. breaks up the open reading framesplugin-autotooltip__default plugin-autotooltip_bigOpen reading frame (ORF): an RNA sequence that begins with a start codon and continues for at least a defined distance (usually 100 codons, or 300 nucleotides long) without encountering an in-frame stop codon. into short segments called exonsplugin-autotooltip__default plugin-autotooltip_bigExon: the portion of a pre-mRNA that contains codons. pre-mRNAs usually contain multiple exons that are interspersed with introns. Formally, exons are defined as a component of pre-mRNA, but the DNA sequence that codes for exons can also be informally referred to as exons.; exonsplugin-autotooltip__default plugin-autotooltip_bigExon: the portion of a pre-mRNA that contains codons. pre-mRNAs usually contain multiple exons that are interspersed with introns. Formally, exons are defined as a component of pre-mRNA, but the DNA sequence that codes for exons can also be informally referred to as exons. are joined together during splicingplugin-autotooltip__default plugin-autotooltip_bigIntron splicing: the act of remove introns and joining exons from pre-mRNA to form an mRNA. This occurs in the nucleus of eukaryotic cells and is catalyzed by an enzyme called the spliceosome. to making genesplugin-autotooltip__default plugin-autotooltip_bigGene: read Chapters 02, 03, 04, 05, and 06 for a definition of gene :-) much harder to distinguish from non-coding sequencesplugin-autotooltip__default plugin-autotooltip_bigSequence: the precise order of monomers in a polymer. In DNA, it refers to the order of G, A, T, and C nucleotides. In RNA, it refers to the order of G, A, U, and C nucleotides. In proteins, it refers to the order of amino acids.. The process of splicingplugin-autotooltip__default plugin-autotooltip_bigIntron splicing: the act of remove introns and joining exons from pre-mRNA to form an mRNA. This occurs in the nucleus of eukaryotic cells and is catalyzed by an enzyme called the spliceosome. removes intronsplugin-autotooltip__default plugin-autotooltip_bigIntron: sequences in a pre-mRNA that sit between exons and are removed by splicing. Formally speaking, introns are defined as being part of pre-mRNA, but the DNA sequence that codes for introns can also be informally described as introns. and joins exonsplugin-autotooltip__default plugin-autotooltip_bigExon: the portion of a pre-mRNA that contains codons. pre-mRNAs usually contain multiple exons that are interspersed with introns. Formally, exons are defined as a component of pre-mRNA, but the DNA sequence that codes for exons can also be informally referred to as exons. to form mature mRNAplugin-autotooltip__default plugin-autotooltip_bigmessenger RNA (mRNA): an RNA molecule that codes for protein.. This makes computer programs that search for ORFsplugin-autotooltip__default plugin-autotooltip_bigOpen reading frame (ORF): an RNA sequence that begins with a start codon and continues for at least a defined distance (usually 100 codons, or 300 nucleotides long) without encountering an in-frame stop codon. much less useful when trying to find eukaryoticplugin-autotooltip__default plugin-autotooltip_bigeukaryote: organism whose cells have membrane bound organelles, including the nucleus. genesplugin-autotooltip__default plugin-autotooltip_bigGene: read Chapters 02, 03, 04, 05, and 06 for a definition of gene :-) from a large genomicplugin-autotooltip__default plugin-autotooltip_bigGenome: a dataset that contains all DNA information of an organism. Most of the time, this also includes annotation and curation of that information, e.g., the names, locations, and functions of genes within the genome. As an adjective (“genomic”), this usually is used in the context of DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. sequenceplugin-autotooltip__default plugin-autotooltip_bigSequence: the precise order of monomers in a polymer. In DNA, it refers to the order of G, A, T, and C nucleotides. In RNA, it refers to the order of G, A, U, and C nucleotides. In proteins, it refers to the order of amino acids.. See Chapter 12 for details on splicingplugin-autotooltip__default plugin-autotooltip_bigIntron splicing: the act of remove introns and joining exons from pre-mRNA to form an mRNA. This occurs in the nucleus of eukaryotic cells and is catalyzed by an enzyme called the spliceosome..

The maps below in Fig. 2 show 50 kbp (1 kbp = 103 bp) segments of genomicplugin-autotooltip__default plugin-autotooltip_bigGenome: a dataset that contains all DNA information of an organism. Most of the time, this also includes annotation and curation of that information, e.g., the names, locations, and functions of genes within the genome. As an adjective (“genomic”), this usually is used in the context of DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. from yeastplugin-autotooltip__default plugin-autotooltip_bigYeast: in this book, refers to Saccharomyces cerevisiae, a single-celled eukaryotic microbe used as a model genetic organism. See Chapter 02, Drosophilaplugin-autotooltip__default plugin-autotooltip_bigDrosophila melanogaster: a fruit fly species used in genetics research., and humans. The dark gray boxes represent exonsplugin-autotooltip__default plugin-autotooltip_bigExon: the portion of a pre-mRNA that contains codons. pre-mRNAs usually contain multiple exons that are interspersed with introns. Formally, exons are defined as a component of pre-mRNA, but the DNA sequence that codes for exons can also be informally referred to as exons., and the light gray boxes represent intronsplugin-autotooltip__default plugin-autotooltip_bigIntron: sequences in a pre-mRNA that sit between exons and are removed by splicing. Formally speaking, introns are defined as being part of pre-mRNA, but the DNA sequence that codes for introns can also be informally described as introns.. The boxes above the lineplugin-autotooltip__default plugin-autotooltip_bigStrain or line: refers to a pool or colony of individuals or cultured cells of a desired genotype or phenotype that is mostly homogeneous and can be bred and/or produced in perpetuity for research or commercial purposes. “Strain” tends to be used more for microorganisms and are transcribedplugin-autotooltip__default plugin-autotooltip_bigRNA transcription: the process of RNA polymerase using the DNA sequence of a gene as a template to form an mRNA (in prokaryotes) or pre-mRNA (in eukaryotes). In most cases, “transcription” implies RNA transcription. from left to right and the boxes below are transcribedplugin-autotooltip__default plugin-autotooltip_bigRNA transcription: the process of RNA polymerase using the DNA sequence of a gene as a template to form an mRNA (in prokaryotes) or pre-mRNA (in eukaryotes). In most cases, “transcription” implies RNA transcription. from right to left. Remember that DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. is double-stranded and coding sequencesplugin-autotooltip__default plugin-autotooltip_bigSequence: the precise order of monomers in a polymer. In DNA, it refers to the order of G, A, T, and C nucleotides. In RNA, it refers to the order of G, A, U, and C nucleotides. In proteins, it refers to the order of amino acids. are read in a 5’ to 3’ direction, so genesplugin-autotooltip__default plugin-autotooltip_bigGene: read Chapters 02, 03, 04, 05, and 06 for a definition of gene :-) can exist on either strand of DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. but in opposite directions. Names have been assigned to each of the identified genesplugin-autotooltip__default plugin-autotooltip_bigGene: read Chapters 02, 03, 04, 05, and 06 for a definition of gene :-). Although the yeastplugin-autotooltip__default plugin-autotooltip_bigYeast: in this book, refers to Saccharomyces cerevisiae, a single-celled eukaryotic microbe used as a model genetic organism. See Chapter 02 genesplugin-autotooltip__default plugin-autotooltip_bigGene: read Chapters 02, 03, 04, 05, and 06 for a definition of gene :-) contain few intronsplugin-autotooltip__default plugin-autotooltip_bigIntron: sequences in a pre-mRNA that sit between exons and are removed by splicing. Formally speaking, introns are defined as being part of pre-mRNA, but the DNA sequence that codes for introns can also be informally described as introns. and are packed closely together, the Drosophilaplugin-autotooltip__default plugin-autotooltip_bigDrosophila melanogaster: a fruit fly species used in genetics research. and human genesplugin-autotooltip__default plugin-autotooltip_bigGene: read Chapters 02, 03, 04, 05, and 06 for a definition of gene :-) are spread apart and interrupted by many intronsplugin-autotooltip__default plugin-autotooltip_bigIntron: sequences in a pre-mRNA that sit between exons and are removed by splicing. Formally speaking, introns are defined as being part of pre-mRNA, but the DNA sequence that codes for introns can also be informally described as introns.. Sophisticated computer algorithms and/or experimental approaches are used to identify these dispersed geneplugin-autotooltip__default plugin-autotooltip_bigGene: read Chapters 02, 03, 04, 05, and 06 for a definition of gene :-) sequencesplugin-autotooltip__default plugin-autotooltip_bigSequence: the precise order of monomers in a polymer. In DNA, it refers to the order of G, A, T, and C nucleotides. In RNA, it refers to the order of G, A, U, and C nucleotides. In proteins, it refers to the order of amino acids.. We will examine eukaryoticplugin-autotooltip__default plugin-autotooltip_bigeukaryote: organism whose cells have membrane bound organelles, including the nucleus. geneplugin-autotooltip__default plugin-autotooltip_bigGene: read Chapters 02, 03, 04, 05, and 06 for a definition of gene :-) structure more closely in Chapter 13.

Figure 2: Segments of physical geneplugin-autotooltip__default plugin-autotooltip_bigGene: read Chapters 02, 03, 04, 05, and 06 for a definition of gene :-) maps from yeastplugin-autotooltip__default plugin-autotooltip_bigYeast: in this book, refers to Saccharomyces cerevisiae, a single-celled eukaryotic microbe used as a model genetic organism. See Chapter 02, Drosophilaplugin-autotooltip__default plugin-autotooltip_bigDrosophila melanogaster: a fruit fly species used in genetics research., and humans. Using yeastplugin-autotooltip__default plugin-autotooltip_bigYeast: in this book, refers to Saccharomyces cerevisiae, a single-celled eukaryotic microbe used as a model genetic organism. See Chapter 02 as an example: labels that follow standard yeastplugin-autotooltip__default plugin-autotooltip_bigYeast: in this book, refers to Saccharomyces cerevisiae, a single-celled eukaryotic microbe used as a model genetic organism. See Chapter 02 nomenclature conventions (e.g., RGD2, FET5) have likely been studied and are curated in the yeastplugin-autotooltip__default plugin-autotooltip_bigYeast: in this book, refers to Saccharomyces cerevisiae, a single-celled eukaryotic microbe used as a model genetic organism. See Chapter 02 genomeplugin-autotooltip__default plugin-autotooltip_bigGenome: a dataset that contains all DNA information of an organism. Most of the time, this also includes annotation and curation of that information, e.g., the names, locations, and functions of genes within the genome. As an adjective (“genomic”), this usually is used in the context of database. By comparison, the code-like letters and numbers (such as YFL046W) are just code names for predicted or poorly characterized genesplugin-autotooltip__default plugin-autotooltip_bigGene: read Chapters 02, 03, 04, 05, and 06 for a definition of gene :-) for which their function is probably not yet known. See text for other details.

An experimental way to identify eukaryoticplugin-autotooltip__default plugin-autotooltip_bigeukaryote: organism whose cells have membrane bound organelles, including the nucleus. genesplugin-autotooltip__default plugin-autotooltip_bigGene: read Chapters 02, 03, 04, 05, and 06 for a definition of gene :-) physically is by examining mRNAplugin-autotooltip__default plugin-autotooltip_bigmessenger RNA (mRNA): an RNA molecule that codes for protein. instead of DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth.. If an mRNAplugin-autotooltip__default plugin-autotooltip_bigmessenger RNA (mRNA): an RNA molecule that codes for protein. exists in a cell, this means that it was most likely transcribedplugin-autotooltip__default plugin-autotooltip_bigRNA transcription: the process of RNA polymerase using the DNA sequence of a gene as a template to form an mRNA (in prokaryotes) or pre-mRNA (in eukaryotes). In most cases, “transcription” implies RNA transcription. from a geneplugin-autotooltip__default plugin-autotooltip_bigGene: read Chapters 02, 03, 04, 05, and 06 for a definition of gene :-). mRNAsplugin-autotooltip__default plugin-autotooltip_bigmessenger RNA (mRNA): an RNA molecule that codes for protein. can be purified from cells biochemically, and an enzymeplugin-autotooltip__default plugin-autotooltip_bigEnzyme: a macromolecule, usually a protein (but sometimes an RNA), that functions as a catalyst of some kind of biochemical reaction. called reverse transcriptaseplugin-autotooltip__default plugin-autotooltip_bigReverse transcriptase: an enzyme that is usually isolated from retroviruses. It catalyzes the formation of cDNA using RNA (usually mRNA) as a template. (usually isolated from various types of retroviruses) can be used to convert mRNAsplugin-autotooltip__default plugin-autotooltip_bigmessenger RNA (mRNA): an RNA molecule that codes for protein. into DNAs called complementary DNAsplugin-autotooltip__default plugin-autotooltip_bigComplementary DNA (cDNA): DNA that has been reverse transcribed from spliced mRNA using reverse transcriptase. cDNA sequences are useful because they lack intron sequences and therefore contain information about the ORF of intron-containing eukaryotic genes. (cDNAsplugin-autotooltip__default plugin-autotooltip_bigComplementary DNA (cDNA): DNA that has been reverse transcribed from spliced mRNA using reverse transcriptase. cDNA sequences are useful because they lack intron sequences and therefore contain information about the ORF of intron-containing eukaryotic genes.). cDNAplugin-autotooltip__default plugin-autotooltip_bigComplementary DNA (cDNA): DNA that has been reverse transcribed from spliced mRNA using reverse transcriptase. cDNA sequences are useful because they lack intron sequences and therefore contain information about the ORF of intron-containing eukaryotic genes. does not exist in nature - it is created by scientists in the lab. We can sequenceplugin-autotooltip__default plugin-autotooltip_bigSequence: the precise order of monomers in a polymer. In DNA, it refers to the order of G, A, T, and C nucleotides. In RNA, it refers to the order of G, A, U, and C nucleotides. In proteins, it refers to the order of amino acids. (discussed below) these cDNAsplugin-autotooltip__default plugin-autotooltip_bigComplementary DNA (cDNA): DNA that has been reverse transcribed from spliced mRNA using reverse transcriptase. cDNA sequences are useful because they lack intron sequences and therefore contain information about the ORF of intron-containing eukaryotic genes. and compare the sequencesplugin-autotooltip__default plugin-autotooltip_bigSequence: the precise order of monomers in a polymer. In DNA, it refers to the order of G, A, T, and C nucleotides. In RNA, it refers to the order of G, A, U, and C nucleotides. In proteins, it refers to the order of amino acids. to genomicplugin-autotooltip__default plugin-autotooltip_bigGenome: a dataset that contains all DNA information of an organism. Most of the time, this also includes annotation and curation of that information, e.g., the names, locations, and functions of genes within the genome. As an adjective (“genomic”), this usually is used in the context of DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. sequencesplugin-autotooltip__default plugin-autotooltip_bigSequence: the precise order of monomers in a polymer. In DNA, it refers to the order of G, A, T, and C nucleotides. In RNA, it refers to the order of G, A, U, and C nucleotides. In proteins, it refers to the order of amino acids. to identify and locate genesplugin-autotooltip__default plugin-autotooltip_bigGene: read Chapters 02, 03, 04, 05, and 06 for a definition of gene :-). Randomly sequenced cDNAsplugin-autotooltip__default plugin-autotooltip_bigComplementary DNA (cDNA): DNA that has been reverse transcribed from spliced mRNA using reverse transcriptase. cDNA sequences are useful because they lack intron sequences and therefore contain information about the ORF of intron-containing eukaryotic genes. derived from mRNAsplugin-autotooltip__default plugin-autotooltip_bigmessenger RNA (mRNA): an RNA molecule that codes for protein. isolated from cells or tissues are often called expressed sequence tagsplugin-autotooltip__default plugin-autotooltip_bigExpressed sequence tag (EST): a cDNA from a cDNA library that has been sequenced and catalogued. (ESTsplugin-autotooltip__default plugin-autotooltip_bigExpressed sequence tag (EST): a cDNA from a cDNA library that has been sequenced and catalogued.). If an ESTplugin-autotooltip__default plugin-autotooltip_bigExpressed sequence tag (EST): a cDNA from a cDNA library that has been sequenced and catalogued. sequenceplugin-autotooltip__default plugin-autotooltip_bigSequence: the precise order of monomers in a polymer. In DNA, it refers to the order of G, A, T, and C nucleotides. In RNA, it refers to the order of G, A, U, and C nucleotides. In proteins, it refers to the order of amino acids. matches small stretches of genomicplugin-autotooltip__default plugin-autotooltip_bigGenome: a dataset that contains all DNA information of an organism. Most of the time, this also includes annotation and curation of that information, e.g., the names, locations, and functions of genes within the genome. As an adjective (“genomic”), this usually is used in the context of DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. sequenceplugin-autotooltip__default plugin-autotooltip_bigSequence: the precise order of monomers in a polymer. In DNA, it refers to the order of G, A, T, and C nucleotides. In RNA, it refers to the order of G, A, U, and C nucleotides. In proteins, it refers to the order of amino acids., that would suggest that a geneplugin-autotooltip__default plugin-autotooltip_bigGene: read Chapters 02, 03, 04, 05, and 06 for a definition of gene :-) is there even if we don't know what the geneplugin-autotooltip__default plugin-autotooltip_bigGene: read Chapters 02, 03, 04, 05, and 06 for a definition of gene :-) or gene productplugin-autotooltip__default plugin-autotooltip_bigGene product: the molecule that is produced based on information contained within a gene and provides function to the organism. Most of the time, a gene product is a protein. Sometimes gene products can also be an RNA molecule. In forward genetic analysis, we can't formally tell if a gene product is (proteinplugin-autotooltip__default plugin-autotooltip_bigProtein: a molecule that is formed by the translation of messenger RNAs (mRNAs). Functions that proteins provide are what usually give organisms their phenotypes.) does. There are now more modern technologies, such as RNAseqplugin-autotooltip__default plugin-autotooltip_bigRNA sequencing (RNAseq): an experimental technique that sequences all the RNAs in a sample. It is based off of converting RNAs into cDNAs with reverse transcriptase, followed by Illumina sequencing. (see below), that are much faster than traditional cDNAplugin-autotooltip__default plugin-autotooltip_bigComplementary DNA (cDNA): DNA that has been reverse transcribed from spliced mRNA using reverse transcriptase. cDNA sequences are useful because they lack intron sequences and therefore contain information about the ORF of intron-containing eukaryotic genes.-based approaches - the details are not important for now, but RNAseqplugin-autotooltip__default plugin-autotooltip_bigRNA sequencing (RNAseq): an experimental technique that sequences all the RNAs in a sample. It is based off of converting RNAs into cDNAs with reverse transcriptase, followed by Illumina sequencing. is very similar to the NGSplugin-autotooltip__default plugin-autotooltip_bigNext generation sequencing (NGS): DNA sequencing technologies that were developed after Sanger sequencing and are much more higher throughput than Sanger sequencing. Includes methods such as Illumina sequencing, 454 pyrosequencing, Nanopore sequencing, etc. technologies introduced below. This book is more focused on how to conceptually analyze geneplugin-autotooltip__default plugin-autotooltip_bigGene: read Chapters 02, 03, 04, 05, and 06 for a definition of gene :-) function rather than details of molecular and genomicplugin-autotooltip__default plugin-autotooltip_bigGenome: a dataset that contains all DNA information of an organism. Most of the time, this also includes annotation and curation of that information, e.g., the names, locations, and functions of genes within the genome. As an adjective (“genomic”), this usually is used in the context of approaches to identifying genesplugin-autotooltip__default plugin-autotooltip_bigGene: read Chapters 02, 03, 04, 05, and 06 for a definition of gene :-).

How to sequence DNA: background information

Reading the order of nucleotidesplugin-autotooltip__default plugin-autotooltip_bigNucleotide: molecules that are polymerized to form nucleic acids (DNA or RNA). Includes dNTPs and NTPs. from a piece of DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. is called DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. sequencingplugin-autotooltip__default plugin-autotooltip_bigSequencing: the procedure used to determine the sequence of a biological polymer such as DNA, RNA, or protein. Although there are indeed biochemical techniques that can be used to directly sequence RNA or protein, these methods are almost never used in modern molecular genetics research - instead, RNA. To see how geneplugin-autotooltip__default plugin-autotooltip_bigGene: read Chapters 02, 03, 04, 05, and 06 for a definition of gene :-) sequencesplugin-autotooltip__default plugin-autotooltip_bigSequence: the precise order of monomers in a polymer. In DNA, it refers to the order of G, A, T, and C nucleotides. In RNA, it refers to the order of G, A, U, and C nucleotides. In proteins, it refers to the order of amino acids. from DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. are actually obtained, we will first need to revisit some fundamentals of the chemical structure of DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. and how DNA replicationplugin-autotooltip__default plugin-autotooltip_bigDNA replication: usually the process of starting with a dsDNA molecule and ending with two identical copies of that dsDNA molecule. In most cases, “replication” implies DNA replication. works (see also Chapter 06). DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. is double-stranded (this is commonly abbreviated as dsDNAplugin-autotooltip__default plugin-autotooltip_bigDouble-stranded DNA (dsDNA): DNA that consists of two complementary strands of ssDNA paired together via hydrogen bonds between the nitrogenous bases G, A, T, and C.). Each strand of DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. is directional - the two ends of a single strand of DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. (ssDNAplugin-autotooltip__default plugin-autotooltip_bigSingle-stranded DNA (ssDNA): a polymerized chain of deoxyribonucleotides that is not paired with a complementary polymer. Usually formed by denaturing dsDNA with heat or other methods.) are structurally different. The different ends are usually called the 5’ and 3’ ends. This refers to different positions on the ribose sugar ring where the linking phosphateplugin-autotooltip__default plugin-autotooltip_bigPhosphate: a functional group with the chemical formula -PO43-. residues attach (Fig. 3).

Figure 3: Chemical structure of DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. revisited. The “phosphateplugin-autotooltip__default plugin-autotooltip_bigPhosphate: a functional group with the chemical formula -PO43-.-deoxyribose backbone” in the diagram is also called the phosphodiester backbone. It can also be helpful to revisit Fig. 6.1. Source: Wikimedia. Credit: M.P. Ball. Licensing: CC0 1.0.

In a double-stranded DNAplugin-autotooltip__default plugin-autotooltip_bigDouble-stranded DNA (dsDNA): DNA that consists of two complementary strands of ssDNA paired together via hydrogen bonds between the nitrogenous bases G, A, T, and C. molecule, the two strands run anti-parallel to one another and are held together by base pairsplugin-autotooltip__default plugin-autotooltip_bigBase pair: a term used to describe how nitrogenous bases (G, A, T/U, and C) in nucleic acids interact with each other via hydrogen bonds to form double-stranded molecules (including dsDNA, dsRNA, and DNA/RNA hybrids). G always pairs with C, and T/U always pairs with A. through hydrogen bondsplugin-autotooltip__default plugin-autotooltip_bigHydrogen bond: a type of weak noncovalent bond that forms between atoms that are part of a dipole movement.. Fig. 3 is structurally correct but difficult to conceptualize when thinking about DNA replicationplugin-autotooltip__default plugin-autotooltip_bigDNA replication: usually the process of starting with a dsDNA molecule and ending with two identical copies of that dsDNA molecule. In most cases, “replication” implies DNA replication.. A more abstract model of DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. structure that is more useful for thinking about DNA replicationplugin-autotooltip__default plugin-autotooltip_bigDNA replication: usually the process of starting with a dsDNA molecule and ending with two identical copies of that dsDNA molecule. In most cases, “replication” implies DNA replication. is shown in Fig. 4:

Figure 4: conceptual diagram showing the antiparallelplugin-autotooltip__default plugin-autotooltip_bigAntiparallel: a term used to describe how the orientation of the two strands of dsDNA are opposite to each other. and base pairingplugin-autotooltip__default plugin-autotooltip_bigBase pair: a term used to describe how nitrogenous bases (G, A, T/U, and C) in nucleic acids interact with each other via hydrogen bonds to form double-stranded molecules (including dsDNA, dsRNA, and DNA/RNA hybrids). G always pairs with C, and T/U always pairs with A. nature of dsDNAplugin-autotooltip__default plugin-autotooltip_bigDouble-stranded DNA (dsDNA): DNA that consists of two complementary strands of ssDNA paired together via hydrogen bonds between the nitrogenous bases G, A, T, and C.. Credit: M. Chao.

DNA polymeraseplugin-autotooltip__default plugin-autotooltip_bigDNA polymerase: an enzyme that is usually involved in DNA replication. This enzyme uses deoxyribonucleotides (dNTPs) and a primed template as a substrate to synthesize a new strand of ssDNA that pairs with its template to form dsDNA. There are many different kinds of DNA polymerase, but in this book we collectively refer to them as is the key enzymeplugin-autotooltip__default plugin-autotooltip_bigEnzyme: a macromolecule, usually a protein (but sometimes an RNA), that functions as a catalyst of some kind of biochemical reaction. in the DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. sequencingplugin-autotooltip__default plugin-autotooltip_bigSequencing: the procedure used to determine the sequence of a biological polymer such as DNA, RNA, or protein. Although there are indeed biochemical techniques that can be used to directly sequence RNA or protein, these methods are almost never used in modern molecular genetics research - instead, RNA methods we will be considering. The general reaction carried out by DNA polymeraseplugin-autotooltip__default plugin-autotooltip_bigDNA polymerase: an enzyme that is usually involved in DNA replication. This enzyme uses deoxyribonucleotides (dNTPs) and a primed template as a substrate to synthesize a new strand of ssDNA that pairs with its template to form dsDNA. There are many different kinds of DNA polymerase, but in this book we collectively refer to them as is to synthesize a copy of a DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. template using the chemical precursors dATPplugin-autotooltip__default plugin-autotooltip_bigDeoxyribonucleotide triphosphates (dNTPs): the collective name for four different molecules (dGTP, dATP, dTTP, dCTP) that are substrates for DNA polymerase and are used to form DNA. These are sometimes abbreivated as “deoxynucleotides” (which clearly refers to dNTPs) or just, dGTPplugin-autotooltip__default plugin-autotooltip_bigDeoxyribonucleotide triphosphates (dNTPs): the collective name for four different molecules (dGTP, dATP, dTTP, dCTP) that are substrates for DNA polymerase and are used to form DNA. These are sometimes abbreivated as “deoxynucleotides” (which clearly refers to dNTPs) or just, dCTPplugin-autotooltip__default plugin-autotooltip_bigDeoxyribonucleotide triphosphates (dNTPs): the collective name for four different molecules (dGTP, dATP, dTTP, dCTP) that are substrates for DNA polymerase and are used to form DNA. These are sometimes abbreivated as “deoxynucleotides” (which clearly refers to dNTPs) or just, and dTTPplugin-autotooltip__default plugin-autotooltip_bigDeoxyribonucleotide triphosphates (dNTPs): the collective name for four different molecules (dGTP, dATP, dTTP, dCTP) that are substrates for DNA polymerase and are used to form DNA. These are sometimes abbreivated as “deoxynucleotides” (which clearly refers to dNTPs) or just (deoxyribonucleotide triphosphatesplugin-autotooltip__default plugin-autotooltip_bigDeoxyribonucleotide triphosphates (dNTPs): the collective name for four different molecules (dGTP, dATP, dTTP, dCTP) that are substrates for DNA polymerase and are used to form DNA. These are sometimes abbreivated as “deoxynucleotides” (which clearly refers to dNTPs) or just, collectively abbreviated as dNTPsplugin-autotooltip__default plugin-autotooltip_bigDeoxyribonucleotide triphosphates (dNTPs): the collective name for four different molecules (dGTP, dATP, dTTP, dCTP) that are substrates for DNA polymerase and are used to form DNA. These are sometimes abbreivated as “deoxynucleotides” (which clearly refers to dNTPs) or just). There are lots of different kinds of DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. polymerases, many with specialized functions or role. But all DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. polymerases have two fundamental properties in common:

  • New DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. is synthesized only by elongation of an existing strand at its 3’ end.
  • Synthesis requires three things: dNTPsplugin-autotooltip__default plugin-autotooltip_bigDeoxyribonucleotide triphosphates (dNTPs): the collective name for four different molecules (dGTP, dATP, dTTP, dCTP) that are substrates for DNA polymerase and are used to form DNA. These are sometimes abbreivated as “deoxynucleotides” (which clearly refers to dNTPs) or just, a template strand, and an existing nucleic acid with a free 3’ hydroxylplugin-autotooltip__default plugin-autotooltip_bigHydroxyl: a functional group with the chemical formula -OH. (-OH) end base paired to the template strand called a primerplugin-autotooltip__default plugin-autotooltip_bigPrimer: (1) A short (usually 15-30 nt long, depending on application) piece of ssDNA that is synthesized in vitro and used in applications such as DNA sequencing and PCR. In this context, primers can also be called oligonucleotides. (2) a free 3'-OH end on a nucleic acid that can be used for initiating.

A general substrateplugin-autotooltip__default plugin-autotooltip_bigSubstrate: a molecule that undergoes a biochemical reaction catalyzed by an enzyme. By inference, a substrate physically binds to its cognate enzyme. for DNA polymeraseplugin-autotooltip__default plugin-autotooltip_bigDNA polymerase: an enzyme that is usually involved in DNA replication. This enzyme uses deoxyribonucleotides (dNTPs) and a primed template as a substrate to synthesize a new strand of ssDNA that pairs with its template to form dsDNA. There are many different kinds of DNA polymerase, but in this book we collectively refer to them as looks like Fig. 5:

Figure 5: The substrateplugin-autotooltip__default plugin-autotooltip_bigSubstrate: a molecule that undergoes a biochemical reaction catalyzed by an enzyme. By inference, a substrate physically binds to its cognate enzyme. for DNA polymeraseplugin-autotooltip__default plugin-autotooltip_bigDNA polymerase: an enzyme that is usually involved in DNA replication. This enzyme uses deoxyribonucleotides (dNTPs) and a primed template as a substrate to synthesize a new strand of ssDNA that pairs with its template to form dsDNA. There are many different kinds of DNA polymerase, but in this book we collectively refer to them as includes a template strand, a primerplugin-autotooltip__default plugin-autotooltip_bigPrimer: (1) A short (usually 15-30 nt long, depending on application) piece of ssDNA that is synthesized in vitro and used in applications such as DNA sequencing and PCR. In this context, primers can also be called oligonucleotides. (2) a free 3'-OH end on a nucleic acid that can be used for initiating with a free 3' hydroxylplugin-autotooltip__default plugin-autotooltip_bigHydroxyl: a functional group with the chemical formula -OH. group, and dNTPsplugin-autotooltip__default plugin-autotooltip_bigDeoxyribonucleotide triphosphates (dNTPs): the collective name for four different molecules (dGTP, dATP, dTTP, dCTP) that are substrates for DNA polymerase and are used to form DNA. These are sometimes abbreivated as “deoxynucleotides” (which clearly refers to dNTPs) or just (not shown). The primerplugin-autotooltip__default plugin-autotooltip_bigPrimer: (1) A short (usually 15-30 nt long, depending on application) piece of ssDNA that is synthesized in vitro and used in applications such as DNA sequencing and PCR. In this context, primers can also be called oligonucleotides. (2) a free 3'-OH end on a nucleic acid that can be used for initiating can be made from single-stranded DNAplugin-autotooltip__default plugin-autotooltip_bigSingle-stranded DNA (ssDNA): a polymerized chain of deoxyribonucleotides that is not paired with a complementary polymer. Usually formed by denaturing dsDNA with heat or other methods. (ssDNAplugin-autotooltip__default plugin-autotooltip_bigSingle-stranded DNA (ssDNA): a polymerized chain of deoxyribonucleotides that is not paired with a complementary polymer. Usually formed by denaturing dsDNA with heat or other methods.) or RNAplugin-autotooltip__default plugin-autotooltip_bigRNA sequencing (RNAseq): an experimental technique that sequences all the RNAs in a sample. It is based off of converting RNAs into cDNAs with reverse transcriptase, followed by Illumina sequencing., but in this diagram the primerplugin-autotooltip__default plugin-autotooltip_bigPrimer: (1) A short (usually 15-30 nt long, depending on application) piece of ssDNA that is synthesized in vitro and used in applications such as DNA sequencing and PCR. In this context, primers can also be called oligonucleotides. (2) a free 3'-OH end on a nucleic acid that can be used for initiating is ssDNAplugin-autotooltip__default plugin-autotooltip_bigSingle-stranded DNA (ssDNA): a polymerized chain of deoxyribonucleotides that is not paired with a complementary polymer. Usually formed by denaturing dsDNA with heat or other methods.. Credit: M. Chao.

Note that the template strand can be as short as 1 base or as long as several thousand bases. The duplex region (i.e., the double stranded region) of this substrateplugin-autotooltip__default plugin-autotooltip_bigSubstrate: a molecule that undergoes a biochemical reaction catalyzed by an enzyme. By inference, a substrate physically binds to its cognate enzyme. has to be just large enough for DNA polymeraseplugin-autotooltip__default plugin-autotooltip_bigDNA polymerase: an enzyme that is usually involved in DNA replication. This enzyme uses deoxyribonucleotides (dNTPs) and a primed template as a substrate to synthesize a new strand of ssDNA that pairs with its template to form dsDNA. There are many different kinds of DNA polymerase, but in this book we collectively refer to them as to bind to it. As a general rule of thumb, this region should be at least around 20 bp in length (it can be much longer but not shorter). After addition of DNA polymeraseplugin-autotooltip__default plugin-autotooltip_bigDNA polymerase: an enzyme that is usually involved in DNA replication. This enzyme uses deoxyribonucleotides (dNTPs) and a primed template as a substrate to synthesize a new strand of ssDNA that pairs with its template to form dsDNA. There are many different kinds of DNA polymerase, but in this book we collectively refer to them as and dNTPsplugin-autotooltip__default plugin-autotooltip_bigDeoxyribonucleotide triphosphates (dNTPs): the collective name for four different molecules (dGTP, dATP, dTTP, dCTP) that are substrates for DNA polymerase and are used to form DNA. These are sometimes abbreivated as “deoxynucleotides” (which clearly refers to dNTPs) or just, a single strand of DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. will be synthesized, starting from the 3' hydroxylplugin-autotooltip__default plugin-autotooltip_bigHydroxyl: a functional group with the chemical formula -OH. group of the primerplugin-autotooltip__default plugin-autotooltip_bigPrimer: (1) A short (usually 15-30 nt long, depending on application) piece of ssDNA that is synthesized in vitro and used in applications such as DNA sequencing and PCR. In this context, primers can also be called oligonucleotides. (2) a free 3'-OH end on a nucleic acid that can be used for initiating and progressing in a 5' to 3' direction. This direction of synthesis is relative to the new strand being made, not the template strand (Fig. 6).

Figure 6: DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. synthesis by DNA polymeraseplugin-autotooltip__default plugin-autotooltip_bigDNA polymerase: an enzyme that is usually involved in DNA replication. This enzyme uses deoxyribonucleotides (dNTPs) and a primed template as a substrate to synthesize a new strand of ssDNA that pairs with its template to form dsDNA. There are many different kinds of DNA polymerase, but in this book we collectively refer to them as occurs in a 5' to 3' direction and initiates off of the 3' hydroxylplugin-autotooltip__default plugin-autotooltip_bigHydroxyl: a functional group with the chemical formula -OH. group of the primerplugin-autotooltip__default plugin-autotooltip_bigPrimer: (1) A short (usually 15-30 nt long, depending on application) piece of ssDNA that is synthesized in vitro and used in applications such as DNA sequencing and PCR. In this context, primers can also be called oligonucleotides. (2) a free 3'-OH end on a nucleic acid that can be used for initiating. Credit: M. Chao.

DNA Sequencing: the details

We first discuss an older but still relevant type of DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. sequencingplugin-autotooltip__default plugin-autotooltip_bigSequencing: the procedure used to determine the sequence of a biological polymer such as DNA, RNA, or protein. Although there are indeed biochemical techniques that can be used to directly sequence RNA or protein, these methods are almost never used in modern molecular genetics research - instead, RNA technology called Sanger sequencingplugin-autotooltip__default plugin-autotooltip_bigSanger sequencing: an older DNA sequencing technology invented in 1977 by Fredrick Sanger.. The basic method was invented in 1977 and is named after its inventor Frederick Sanger. Consider a segment of dsDNAplugin-autotooltip__default plugin-autotooltip_bigDouble-stranded DNA (dsDNA): DNA that consists of two complementary strands of ssDNA paired together via hydrogen bonds between the nitrogenous bases G, A, T, and C. that is about 1000 base pairsplugin-autotooltip__default plugin-autotooltip_bigBase pair: a term used to describe how nitrogenous bases (G, A, T/U, and C) in nucleic acids interact with each other via hydrogen bonds to form double-stranded molecules (including dsDNA, dsRNA, and DNA/RNA hybrids). G always pairs with C, and T/U always pairs with A. long that we wish to sequenceplugin-autotooltip__default plugin-autotooltip_bigSequence: the precise order of monomers in a polymer. In DNA, it refers to the order of G, A, T, and C nucleotides. In RNA, it refers to the order of G, A, U, and C nucleotides. In proteins, it refers to the order of amino acids.. To sequenceplugin-autotooltip__default plugin-autotooltip_bigSequence: the precise order of monomers in a polymer. In DNA, it refers to the order of G, A, T, and C nucleotides. In RNA, it refers to the order of G, A, U, and C nucleotides. In proteins, it refers to the order of amino acids. this DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth., we first need to have a source of DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. material, which we consider further below. Assuming we have several μg (microgram; 1 μg = 10-6 g) of DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. template to sequenceplugin-autotooltip__default plugin-autotooltip_bigSequence: the precise order of monomers in a polymer. In DNA, it refers to the order of G, A, T, and C nucleotides. In RNA, it refers to the order of G, A, U, and C nucleotides. In proteins, it refers to the order of amino acids., we first separate the two DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. strands by heating the DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. to about 100 °C to melt the hydrogen bondsplugin-autotooltip__default plugin-autotooltip_bigHydrogen bond: a type of weak noncovalent bond that forms between atoms that are part of a dipole movement. that hold the ssDNAs together through base pairingplugin-autotooltip__default plugin-autotooltip_bigBase pair: a term used to describe how nitrogenous bases (G, A, T/U, and C) in nucleic acids interact with each other via hydrogen bonds to form double-stranded molecules (including dsDNA, dsRNA, and DNA/RNA hybrids). G always pairs with C, and T/U always pairs with A..

Next, a short single-stranded primerplugin-autotooltip__default plugin-autotooltip_bigPrimer: (1) A short (usually 15-30 nt long, depending on application) piece of ssDNA that is synthesized in vitro and used in applications such as DNA sequencing and PCR. In this context, primers can also be called oligonucleotides. (2) a free 3'-OH end on a nucleic acid that can be used for initiating (about 18-20 bases long) designed to be complimentary to the end of one of the strands is allowed to anneal to the single stranded DNAplugin-autotooltip__default plugin-autotooltip_bigSingle-stranded DNA (ssDNA): a polymerized chain of deoxyribonucleotides that is not paired with a complementary polymer. Usually formed by denaturing dsDNA with heat or other methods.. These primersplugin-autotooltip__default plugin-autotooltip_bigPrimer: (1) A short (usually 15-30 nt long, depending on application) piece of ssDNA that is synthesized in vitro and used in applications such as DNA sequencing and PCR. In this context, primers can also be called oligonucleotides. (2) a free 3'-OH end on a nucleic acid that can be used for initiating are designed with the help of a computer and synthesized through commercially available services. The primerplugin-autotooltip__default plugin-autotooltip_bigPrimer: (1) A short (usually 15-30 nt long, depending on application) piece of ssDNA that is synthesized in vitro and used in applications such as DNA sequencing and PCR. In this context, primers can also be called oligonucleotides. (2) a free 3'-OH end on a nucleic acid that can be used for initiating is added at a huge molar excess compared to the DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. you are trying to sequenceplugin-autotooltip__default plugin-autotooltip_bigSequence: the precise order of monomers in a polymer. In DNA, it refers to the order of G, A, T, and C nucleotides. In RNA, it refers to the order of G, A, U, and C nucleotides. In proteins, it refers to the order of amino acids. – so most ssDNAs will pair with primerplugin-autotooltip__default plugin-autotooltip_bigPrimer: (1) A short (usually 15-30 nt long, depending on application) piece of ssDNA that is synthesized in vitro and used in applications such as DNA sequencing and PCR. In this context, primers can also be called oligonucleotides. (2) a free 3'-OH end on a nucleic acid that can be used for initiating DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. rather than their original complementary partner ssDNAplugin-autotooltip__default plugin-autotooltip_bigSingle-stranded DNA (ssDNA): a polymerized chain of deoxyribonucleotides that is not paired with a complementary polymer. Usually formed by denaturing dsDNA with heat or other methods.. The resulting DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. hybrid looks much like the general DNA polymeraseplugin-autotooltip__default plugin-autotooltip_bigDNA polymerase: an enzyme that is usually involved in DNA replication. This enzyme uses deoxyribonucleotides (dNTPs) and a primed template as a substrate to synthesize a new strand of ssDNA that pairs with its template to form dsDNA. There are many different kinds of DNA polymerase, but in this book we collectively refer to them as substrateplugin-autotooltip__default plugin-autotooltip_bigSubstrate: a molecule that undergoes a biochemical reaction catalyzed by an enzyme. By inference, a substrate physically binds to its cognate enzyme. shown in Fig. 5.

DNA polymeraseplugin-autotooltip__default plugin-autotooltip_bigDNA polymerase: an enzyme that is usually involved in DNA replication. This enzyme uses deoxyribonucleotides (dNTPs) and a primed template as a substrate to synthesize a new strand of ssDNA that pairs with its template to form dsDNA. There are many different kinds of DNA polymerase, but in this book we collectively refer to them as is then added along with the four dNTPplugin-autotooltip__default plugin-autotooltip_bigDeoxyribonucleotide triphosphates (dNTPs): the collective name for four different molecules (dGTP, dATP, dTTP, dCTP) that are substrates for DNA polymerase and are used to form DNA. These are sometimes abbreivated as “deoxynucleotides” (which clearly refers to dNTPs) or just nucleotideplugin-autotooltip__default plugin-autotooltip_bigNucleotide: molecules that are polymerized to form nucleic acids (DNA or RNA). Includes dNTPs and NTPs. precursors (dATPplugin-autotooltip__default plugin-autotooltip_bigDeoxyribonucleotide triphosphates (dNTPs): the collective name for four different molecules (dGTP, dATP, dTTP, dCTP) that are substrates for DNA polymerase and are used to form DNA. These are sometimes abbreivated as “deoxynucleotides” (which clearly refers to dNTPs) or just, dGTPplugin-autotooltip__default plugin-autotooltip_bigDeoxyribonucleotide triphosphates (dNTPs): the collective name for four different molecules (dGTP, dATP, dTTP, dCTP) that are substrates for DNA polymerase and are used to form DNA. These are sometimes abbreivated as “deoxynucleotides” (which clearly refers to dNTPs) or just, dCTPplugin-autotooltip__default plugin-autotooltip_bigDeoxyribonucleotide triphosphates (dNTPs): the collective name for four different molecules (dGTP, dATP, dTTP, dCTP) that are substrates for DNA polymerase and are used to form DNA. These are sometimes abbreivated as “deoxynucleotides” (which clearly refers to dNTPs) or just, and dTTPplugin-autotooltip__default plugin-autotooltip_bigDeoxyribonucleotide triphosphates (dNTPs): the collective name for four different molecules (dGTP, dATP, dTTP, dCTP) that are substrates for DNA polymerase and are used to form DNA. These are sometimes abbreivated as “deoxynucleotides” (which clearly refers to dNTPs) or just). A small quantity of a slightly different nucleotideplugin-autotooltip__default plugin-autotooltip_bigNucleotide: molecules that are polymerized to form nucleic acids (DNA or RNA). Includes dNTPs and NTPs. precursor called a dideoxyribonucleotide triphosphateplugin-autotooltip__default plugin-autotooltip_bigDideoxyribonucleotide triphosphate (ddNTP): A modified dNTP that functions as a chain terminator in Sanger sequencing. Includes ddGTP, ddATP, ddTTP, and ddCTP. is also added. Dideoxy nucleotideplugin-autotooltip__default plugin-autotooltip_bigNucleotide: molecules that are polymerized to form nucleic acids (DNA or RNA). Includes dNTPs and NTPs. precursors are abbreviated ddATPplugin-autotooltip__default plugin-autotooltip_bigDideoxyribonucleotide triphosphate (ddNTP): A modified dNTP that functions as a chain terminator in Sanger sequencing. Includes ddGTP, ddATP, ddTTP, and ddCTP., ddGTPplugin-autotooltip__default plugin-autotooltip_bigDideoxyribonucleotide triphosphate (ddNTP): A modified dNTP that functions as a chain terminator in Sanger sequencing. Includes ddGTP, ddATP, ddTTP, and ddCTP., ddCTPplugin-autotooltip__default plugin-autotooltip_bigDideoxyribonucleotide triphosphate (ddNTP): A modified dNTP that functions as a chain terminator in Sanger sequencing. Includes ddGTP, ddATP, ddTTP, and ddCTP., and ddTTPplugin-autotooltip__default plugin-autotooltip_bigDideoxyribonucleotide triphosphate (ddNTP): A modified dNTP that functions as a chain terminator in Sanger sequencing. Includes ddGTP, ddATP, ddTTP, and ddCTP. (or ddNTPsplugin-autotooltip__default plugin-autotooltip_bigDideoxyribonucleotide triphosphate (ddNTP): A modified dNTP that functions as a chain terminator in Sanger sequencing. Includes ddGTP, ddATP, ddTTP, and ddCTP. collectively). The ddNTPsplugin-autotooltip__default plugin-autotooltip_bigDideoxyribonucleotide triphosphate (ddNTP): A modified dNTP that functions as a chain terminator in Sanger sequencing. Includes ddGTP, ddATP, ddTTP, and ddCTP. have each been chemically labeled with a unique fluorophoreplugin-autotooltip__default plugin-autotooltip_bigFluorophore: a molecule that emits a particular wavelength (i.e., color) of light after it has been stimulated with a lower wavelength (i.e., higher energy) of light. For instance, fluorescein is a fluorophore is stimulated by blue light and emits green light. that emits a different color of light after stimulation with a laser – for instance, green for ddATPplugin-autotooltip__default plugin-autotooltip_bigDideoxyribonucleotide triphosphate (ddNTP): A modified dNTP that functions as a chain terminator in Sanger sequencing. Includes ddGTP, ddATP, ddTTP, and ddCTP., cyan for ddCTPplugin-autotooltip__default plugin-autotooltip_bigDideoxyribonucleotide triphosphate (ddNTP): A modified dNTP that functions as a chain terminator in Sanger sequencing. Includes ddGTP, ddATP, ddTTP, and ddCTP., yellow for ddGTPplugin-autotooltip__default plugin-autotooltip_bigDideoxyribonucleotide triphosphate (ddNTP): A modified dNTP that functions as a chain terminator in Sanger sequencing. Includes ddGTP, ddATP, ddTTP, and ddCTP., and red for ddTTPplugin-autotooltip__default plugin-autotooltip_bigDideoxyribonucleotide triphosphate (ddNTP): A modified dNTP that functions as a chain terminator in Sanger sequencing. Includes ddGTP, ddATP, ddTTP, and ddCTP.. These molecules are identical to the normal dNTPsplugin-autotooltip__default plugin-autotooltip_bigDeoxyribonucleotide triphosphates (dNTPs): the collective name for four different molecules (dGTP, dATP, dTTP, dCTP) that are substrates for DNA polymerase and are used to form DNA. These are sometimes abbreivated as “deoxynucleotides” (which clearly refers to dNTPs) or just in all respects except that they lack a hydroxylplugin-autotooltip__default plugin-autotooltip_bigHydroxyl: a functional group with the chemical formula -OH. group at their 3’ position (3’ OH) and that their nucleotideplugin-autotooltip__default plugin-autotooltip_bigNucleotide: molecules that are polymerized to form nucleic acids (DNA or RNA). Includes dNTPs and NTPs. bases are chemically labeled with a fluorophoreplugin-autotooltip__default plugin-autotooltip_bigFluorophore: a molecule that emits a particular wavelength (i.e., color) of light after it has been stimulated with a lower wavelength (i.e., higher energy) of light. For instance, fluorescein is a fluorophore is stimulated by blue light and emits green light. (Fig. 7; fluorophoreplugin-autotooltip__default plugin-autotooltip_bigFluorophore: a molecule that emits a particular wavelength (i.e., color) of light after it has been stimulated with a lower wavelength (i.e., higher energy) of light. For instance, fluorescein is a fluorophore is stimulated by blue light and emits green light. not shown). ddNTPsplugin-autotooltip__default plugin-autotooltip_bigDideoxyribonucleotide triphosphate (ddNTP): A modified dNTP that functions as a chain terminator in Sanger sequencing. Includes ddGTP, ddATP, ddTTP, and ddCTP. are also called chain terminators.

Figure 7: Structural comparison between ddNTPsplugin-autotooltip__default plugin-autotooltip_bigDideoxyribonucleotide triphosphate (ddNTP): A modified dNTP that functions as a chain terminator in Sanger sequencing. Includes ddGTP, ddATP, ddTTP, and ddCTP. and dNTPsplugin-autotooltip__default plugin-autotooltip_bigDeoxyribonucleotide triphosphates (dNTPs): the collective name for four different molecules (dGTP, dATP, dTTP, dCTP) that are substrates for DNA polymerase and are used to form DNA. These are sometimes abbreivated as “deoxynucleotides” (which clearly refers to dNTPs) or just. ddNTPsplugin-autotooltip__default plugin-autotooltip_bigDideoxyribonucleotide triphosphate (ddNTP): A modified dNTP that functions as a chain terminator in Sanger sequencing. Includes ddGTP, ddATP, ddTTP, and ddCTP. do not have a hydroxylplugin-autotooltip__default plugin-autotooltip_bigHydroxyl: a functional group with the chemical formula -OH. group on the 3' carbon of the ribose ring. Therefore, once a ddNTPplugin-autotooltip__default plugin-autotooltip_bigDideoxyribonucleotide triphosphate (ddNTP): A modified dNTP that functions as a chain terminator in Sanger sequencing. Includes ddGTP, ddATP, ddTTP, and ddCTP. is added to the end of a newly synthesized DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. strand, that strand cannot be polymerized further by DNA polymeraseplugin-autotooltip__default plugin-autotooltip_bigDNA polymerase: an enzyme that is usually involved in DNA replication. This enzyme uses deoxyribonucleotides (dNTPs) and a primed template as a substrate to synthesize a new strand of ssDNA that pairs with its template to form dsDNA. There are many different kinds of DNA polymerase, but in this book we collectively refer to them as. Source: OpenStax Microbiology. Licensing: CC BY 4.0.

The polymerase reactions are initiated, usually by moving the reaction mix from an ice bath to a 37 °C bath. After the DNA polymeraseplugin-autotooltip__default plugin-autotooltip_bigDNA polymerase: an enzyme that is usually involved in DNA replication. This enzyme uses deoxyribonucleotides (dNTPs) and a primed template as a substrate to synthesize a new strand of ssDNA that pairs with its template to form dsDNA. There are many different kinds of DNA polymerase, but in this book we collectively refer to them as reactions are complete, the samples are denatured with heat again and separated using an electrical field through a high-resolution gel system that allows DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. strands of different lengths to be resolved; even differences of 1 nucleotideplugin-autotooltip__default plugin-autotooltip_bigNucleotide: molecules that are polymerized to form nucleic acids (DNA or RNA). Includes dNTPs and NTPs. can be resolved1)! The DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. sequenceplugin-autotooltip__default plugin-autotooltip_bigSequence: the precise order of monomers in a polymer. In DNA, it refers to the order of G, A, T, and C nucleotides. In RNA, it refers to the order of G, A, U, and C nucleotides. In proteins, it refers to the order of amino acids. can be read from the gel by scanning it with a laser and using a light detector to visualize different colors of fluorescent light (Fig. 8).

Figure 8: Automated Sanger sequencingplugin-autotooltip__default plugin-autotooltip_bigSanger sequencing: an older DNA sequencing technology invented in 1977 by Fredrick Sanger.. On the left side of the image, each duplex containing a terminated chain is a different DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. molecule; this is why a large quantity of starting DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. is needed for Sanger sequencingplugin-autotooltip__default plugin-autotooltip_bigSanger sequencing: an older DNA sequencing technology invented in 1977 by Fredrick Sanger.. The chains are terminated randomly, but the diagram shows the terminated chains in order of size for illustrative purposes. Note the order of the colored bands – compare the lengths of the randomly terminated strains on the left with the order of migration on the gel on the right. Separating them on a gel based on size orders the bands such that the ddNTPsplugin-autotooltip__default plugin-autotooltip_bigDideoxyribonucleotide triphosphate (ddNTP): A modified dNTP that functions as a chain terminator in Sanger sequencing. Includes ddGTP, ddATP, ddTTP, and ddCTP. reveal the DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. sequenceplugin-autotooltip__default plugin-autotooltip_bigSequence: the precise order of monomers in a polymer. In DNA, it refers to the order of G, A, T, and C nucleotides. In RNA, it refers to the order of G, A, U, and C nucleotides. In proteins, it refers to the order of amino acids.. Credit: M. Chao.

ddNTPsplugin-autotooltip__default plugin-autotooltip_bigDideoxyribonucleotide triphosphate (ddNTP): A modified dNTP that functions as a chain terminator in Sanger sequencing. Includes ddGTP, ddATP, ddTTP, and ddCTP. can be incorporated into DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth., but once a ddNTPplugin-autotooltip__default plugin-autotooltip_bigDideoxyribonucleotide triphosphate (ddNTP): A modified dNTP that functions as a chain terminator in Sanger sequencing. Includes ddGTP, ddATP, ddTTP, and ddCTP. has been incorporated, further elongation stops because the resulting DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. will no longer have a free 3’ OH end. Each of the ddNTPsplugin-autotooltip__default plugin-autotooltip_bigDideoxyribonucleotide triphosphate (ddNTP): A modified dNTP that functions as a chain terminator in Sanger sequencing. Includes ddGTP, ddATP, ddTTP, and ddCTP. is added at about 1% the concentration of the normal nucleotideplugin-autotooltip__default plugin-autotooltip_bigNucleotide: molecules that are polymerized to form nucleic acids (DNA or RNA). Includes dNTPs and NTPs. precursors. Thus, using ddATPplugin-autotooltip__default plugin-autotooltip_bigDideoxyribonucleotide triphosphate (ddNTP): A modified dNTP that functions as a chain terminator in Sanger sequencing. Includes ddGTP, ddATP, ddTTP, and ddCTP. as an example, about 1% of the elongated chains will randomly terminate at the position of an A in the sequenceplugin-autotooltip__default plugin-autotooltip_bigSequence: the precise order of monomers in a polymer. In DNA, it refers to the order of G, A, T, and C nucleotides. In RNA, it refers to the order of G, A, U, and C nucleotides. In proteins, it refers to the order of amino acids.; the same will be true for the other ddNTPsplugin-autotooltip__default plugin-autotooltip_bigDideoxyribonucleotide triphosphate (ddNTP): A modified dNTP that functions as a chain terminator in Sanger sequencing. Includes ddGTP, ddATP, ddTTP, and ddCTP.. Once all the elongating chains have been terminated, there will be a population of newly synthesized and fluorescently labeled ssDNAplugin-autotooltip__default plugin-autotooltip_bigSingle-stranded DNA (ssDNA): a polymerized chain of deoxyribonucleotides that is not paired with a complementary polymer. Usually formed by denaturing dsDNA with heat or other methods. strands that have terminated at the position of the sequenceplugin-autotooltip__default plugin-autotooltip_bigSequence: the precise order of monomers in a polymer. In DNA, it refers to the order of G, A, T, and C nucleotides. In RNA, it refers to the order of G, A, U, and C nucleotides. In proteins, it refers to the order of amino acids.. The right-hand side of Fig. 8 (the part with colorful horizontal bands) represents a gel where DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. fragments of different lengths, each ending with a chain terminator, are separated using electrophoresisplugin-autotooltip__default plugin-autotooltip_bigElectrophoresis: an experimental technique that is used to separate charged molecules (such as nucleic acids) by size by applying an electric field to cause the charged molecules to move through a sheet of a gel-like material. on a high-resolution gel. The DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. fragments migrate toward the positively charged cathode (because the phosphateplugin-autotooltip__default plugin-autotooltip_bigPhosphate: a functional group with the chemical formula -PO43-. groups on the backbone of DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. are negatively charged), and shorter fragments migrate faster than longer fragments; this technique is called electrophoresisplugin-autotooltip__default plugin-autotooltip_bigElectrophoresis: an experimental technique that is used to separate charged molecules (such as nucleic acids) by size by applying an electric field to cause the charged molecules to move through a sheet of a gel-like material.. A laser and light detector coupled to a computer then automatically reads the different colored bands in order and determines the DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. sequenceplugin-autotooltip__default plugin-autotooltip_bigSequence: the precise order of monomers in a polymer. In DNA, it refers to the order of G, A, T, and C nucleotides. In RNA, it refers to the order of G, A, U, and C nucleotides. In proteins, it refers to the order of amino acids. (Fig. 8).

Polymerase Chain Reaction (PCR)

Now let’s consider how to physically obtain DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. for sequencingplugin-autotooltip__default plugin-autotooltip_bigSequencing: the procedure used to determine the sequence of a biological polymer such as DNA, RNA, or protein. Although there are indeed biochemical techniques that can be used to directly sequence RNA or protein, these methods are almost never used in modern molecular genetics research - instead, RNA. A relatively large amount of DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. (approx. 1 μg worth for a piece of DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. several kbp long) is needed for the Sanger chemistry to work. As a student, you might not have a feel for how much DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. this is, but it's a substantial amount! To give you an idea of the scale of the problem, see Exercise 1 below. In the early days of molecular biologyplugin-autotooltip__default plugin-autotooltip_bigMolecular biology: the study of nucleic acids, specifically DNA and RNA. research, DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. for sequencingplugin-autotooltip__default plugin-autotooltip_bigSequencing: the procedure used to determine the sequence of a biological polymer such as DNA, RNA, or protein. Although there are indeed biochemical techniques that can be used to directly sequence RNA or protein, these methods are almost never used in modern molecular genetics research - instead, RNA was obtained from cloned DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. segments, which can be difficult to create, but once created can easily be produced to give the quantity of DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. needed (molecular cloningplugin-autotooltip__default plugin-autotooltip_bigClone: Depending on the context, this word can have a few different meanings:

* In the context of genes, cloning means that the physical identity of a gene has been found, and the gene has been sequenced. * In the context of DNA, a cloned DNA fragment is one that has been inserted into some kind of
is still used today for various purposes). We will discuss some methods for cloningplugin-autotooltip__default plugin-autotooltip_bigClone: Depending on the context, this word can have a few different meanings:

* In the context of genes, cloning means that the physical identity of a gene has been found, and the gene has been sequenced. * In the context of DNA, a cloned DNA fragment is one that has been inserted into some kind of
new genesplugin-autotooltip__default plugin-autotooltip_bigGene: read Chapters 02, 03, 04, 05, and 06 for a definition of gene :-) in Chapter 09.

If we want to quickly find the sequenceplugin-autotooltip__default plugin-autotooltip_bigSequence: the precise order of monomers in a polymer. In DNA, it refers to the order of G, A, T, and C nucleotides. In RNA, it refers to the order of G, A, U, and C nucleotides. In proteins, it refers to the order of amino acids. of a new mutantplugin-autotooltip__default plugin-autotooltip_bigMutant: an individual that has a different phenotype than wildtype and likely contains one more mutations that cause this difference. alleleplugin-autotooltip__default plugin-autotooltip_bigAllele: a version of a gene. Alleles of a gene are different if they have differences in their DNA sequence. of a known geneplugin-autotooltip__default plugin-autotooltip_bigGene: read Chapters 02, 03, 04, 05, and 06 for a definition of gene :-), we need an easy way to obtain a relatively large quantity of this DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. without needing to go through molecular cloningplugin-autotooltip__default plugin-autotooltip_bigClone: Depending on the context, this word can have a few different meanings:

* In the context of genes, cloning means that the physical identity of a gene has been found, and the gene has been sequenced. * In the context of DNA, a cloned DNA fragment is one that has been inserted into some kind of
. The easiest and most common way to do this is to use an in vitroplugin-autotooltip__default plugin-autotooltip_bigIn vitro: taking place outside of a living organism, usually in a test tube or Petri dish. method known as the polymerase chain reactionplugin-autotooltip__default plugin-autotooltip_bigPolymerase chain reaction (PCR): An experimental technique invented by Kary Mullis used to exponentially amplify DNA in vitro. PCR made obtaining large quantities of DNA for analysis much faster and easier than using traditional cloning methods. (PCRplugin-autotooltip__default plugin-autotooltip_bigPolymerase chain reaction (PCR): An experimental technique invented by Kary Mullis used to exponentially amplify DNA in vitro. PCR made obtaining large quantities of DNA for analysis much faster and easier than using traditional cloning methods.) that was developed by Kary Mullis in the mid-1980s (Fig. 7). The steps in PCRplugin-autotooltip__default plugin-autotooltip_bigPolymerase chain reaction (PCR): An experimental technique invented by Kary Mullis used to exponentially amplify DNA in vitro. PCR made obtaining large quantities of DNA for analysis much faster and easier than using traditional cloning methods. are as follows:

  1. A crude preparation of chromosomalplugin-autotooltip__default plugin-autotooltip_bigChromosome: a structure that organizes dsDNA in a cell through interactions with various DNA binding proteins. DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. is extracted from the tissue source of interest (there is usually not enough DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. for sequencingplugin-autotooltip__default plugin-autotooltip_bigSequencing: the procedure used to determine the sequence of a biological polymer such as DNA, RNA, or protein. Although there are indeed biochemical techniques that can be used to directly sequence RNA or protein, these methods are almost never used in modern molecular genetics research - instead, RNA from this step).
  2. Two short primersplugin-autotooltip__default plugin-autotooltip_bigPrimer: (1) A short (usually 15-30 nt long, depending on application) piece of ssDNA that is synthesized in vitro and used in applications such as DNA sequencing and PCR. In this context, primers can also be called oligonucleotides. (2) a free 3'-OH end on a nucleic acid that can be used for initiating (each about 18-20 bases long) are added to the DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. at an enormous molar excess. The primersplugin-autotooltip__default plugin-autotooltip_bigPrimer: (1) A short (usually 15-30 nt long, depending on application) piece of ssDNA that is synthesized in vitro and used in applications such as DNA sequencing and PCR. In this context, primers can also be called oligonucleotides. (2) a free 3'-OH end on a nucleic acid that can be used for initiating are designed from the known genomicplugin-autotooltip__default plugin-autotooltip_bigGenome: a dataset that contains all DNA information of an organism. Most of the time, this also includes annotation and curation of that information, e.g., the names, locations, and functions of genes within the genome. As an adjective (“genomic”), this usually is used in the context of sequenceplugin-autotooltip__default plugin-autotooltip_bigSequence: the precise order of monomers in a polymer. In DNA, it refers to the order of G, A, T, and C nucleotides. In RNA, it refers to the order of G, A, U, and C nucleotides. In proteins, it refers to the order of amino acids. to be complimentary to opposite strands of DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. and to flank the chromosomalplugin-autotooltip__default plugin-autotooltip_bigChromosome: a structure that organizes dsDNA in a cell through interactions with various DNA binding proteins. segment of interest.
  3. The double stranded DNAplugin-autotooltip__default plugin-autotooltip_bigDouble-stranded DNA (dsDNA): DNA that consists of two complementary strands of ssDNA paired together via hydrogen bonds between the nitrogenous bases G, A, T, and C. is melted by heating to around 100 ˚C (in practice we usually use 95 °C) and then the mixture is cooled (usually around 50 °C) to allow the primersplugin-autotooltip__default plugin-autotooltip_bigPrimer: (1) A short (usually 15-30 nt long, depending on application) piece of ssDNA that is synthesized in vitro and used in applications such as DNA sequencing and PCR. In this context, primers can also be called oligonucleotides. (2) a free 3'-OH end on a nucleic acid that can be used for initiating to anneal to the template DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth.. Since there is a huge molar excess of primerplugin-autotooltip__default plugin-autotooltip_bigPrimer: (1) A short (usually 15-30 nt long, depending on application) piece of ssDNA that is synthesized in vitro and used in applications such as DNA sequencing and PCR. In this context, primers can also be called oligonucleotides. (2) a free 3'-OH end on a nucleic acid that can be used for initiating vs. template, most of the template will anneal with primerplugin-autotooltip__default plugin-autotooltip_bigPrimer: (1) A short (usually 15-30 nt long, depending on application) piece of ssDNA that is synthesized in vitro and used in applications such as DNA sequencing and PCR. In this context, primers can also be called oligonucleotides. (2) a free 3'-OH end on a nucleic acid that can be used for initiating rather than re-anneal with its original partner strand.
  4. DNA polymeraseplugin-autotooltip__default plugin-autotooltip_bigDNA polymerase: an enzyme that is usually involved in DNA replication. This enzyme uses deoxyribonucleotides (dNTPs) and a primed template as a substrate to synthesize a new strand of ssDNA that pairs with its template to form dsDNA. There are many different kinds of DNA polymerase, but in this book we collectively refer to them as and the four nucleotideplugin-autotooltip__default plugin-autotooltip_bigNucleotide: molecules that are polymerized to form nucleic acids (DNA or RNA). Includes dNTPs and NTPs. precursors are added, and the reaction is incubated at around 72 ˚C for a period of time to allow a copy of the segment to be synthesized. The reason we use 72 °C instead of 37 °C like we do for most enzymatic reactions is that we use a special heat stable enzymeplugin-autotooltip__default plugin-autotooltip_bigEnzyme: a macromolecule, usually a protein (but sometimes an RNA), that functions as a catalyst of some kind of biochemical reaction. called Taq DNA polymeraseplugin-autotooltip__default plugin-autotooltip_bigTaq DNA polymerase (pronounced “tack”; often just abbreviated as Taq polymerase) is a heat stable DNA polymerase that retains functionailty even after repeated heating and cooling cycles used during the polymerase chain reaction. Standard enzymes will typically denature an lose activity after repeated heating and cooling. It is isolated from the instead of standard DNA polymeraseplugin-autotooltip__default plugin-autotooltip_bigDNA polymerase: an enzyme that is usually involved in DNA replication. This enzyme uses deoxyribonucleotides (dNTPs) and a primed template as a substrate to synthesize a new strand of ssDNA that pairs with its template to form dsDNA. There are many different kinds of DNA polymerase, but in this book we collectively refer to them as.
  5. Repeat steps 3 and 4 multiple times (up to 30-35 cycles). To avoid the inconvenience of having to add new DNA polymeraseplugin-autotooltip__default plugin-autotooltip_bigDNA polymerase: an enzyme that is usually involved in DNA replication. This enzyme uses deoxyribonucleotides (dNTPs) and a primed template as a substrate to synthesize a new strand of ssDNA that pairs with its template to form dsDNA. There are many different kinds of DNA polymerase, but in this book we collectively refer to them as in each cycle (due to the heating cycle eliminating DNA polymeraseplugin-autotooltip__default plugin-autotooltip_bigDNA polymerase: an enzyme that is usually involved in DNA replication. This enzyme uses deoxyribonucleotides (dNTPs) and a primed template as a substrate to synthesize a new strand of ssDNA that pairs with its template to form dsDNA. There are many different kinds of DNA polymerase, but in this book we collectively refer to them as activity), a special DNA polymeraseplugin-autotooltip__default plugin-autotooltip_bigDNA polymerase: an enzyme that is usually involved in DNA replication. This enzyme uses deoxyribonucleotides (dNTPs) and a primed template as a substrate to synthesize a new strand of ssDNA that pairs with its template to form dsDNA. There are many different kinds of DNA polymerase, but in this book we collectively refer to them as called Taq polymerase that can withstand heating to 100 ˚C is used.

The idea behind PCRplugin-autotooltip__default plugin-autotooltip_bigPolymerase chain reaction (PCR): An experimental technique invented by Kary Mullis used to exponentially amplify DNA in vitro. PCR made obtaining large quantities of DNA for analysis much faster and easier than using traditional cloning methods. is that in each cycle of melting, annealing, and DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. synthesis, the amount of the DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. segment is doubled. This gives an exponential increase2) in the amount of the specific DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. bounded by the primersplugin-autotooltip__default plugin-autotooltip_bigPrimer: (1) A short (usually 15-30 nt long, depending on application) piece of ssDNA that is synthesized in vitro and used in applications such as DNA sequencing and PCR. In this context, primers can also be called oligonucleotides. (2) a free 3'-OH end on a nucleic acid that can be used for initiating on either side as the cycles proceed. After 10 cycles the DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. is amplified 210-fold (210 = 1024; in other words, about 1000-fold) and after 20 cycles the DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. will be amplified 220-fold (or approximately 106-fold). Amplification usually continues until all of the nucleotideplugin-autotooltip__default plugin-autotooltip_bigNucleotide: molecules that are polymerized to form nucleic acids (DNA or RNA). Includes dNTPs and NTPs. precursors are incorporated into synthesized DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth.. The resulting PCRplugin-autotooltip__default plugin-autotooltip_bigPolymerase chain reaction (PCR): An experimental technique invented by Kary Mullis used to exponentially amplify DNA in vitro. PCR made obtaining large quantities of DNA for analysis much faster and easier than using traditional cloning methods. product can be used for DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. sequencingplugin-autotooltip__default plugin-autotooltip_bigSequencing: the procedure used to determine the sequence of a biological polymer such as DNA, RNA, or protein. Although there are indeed biochemical techniques that can be used to directly sequence RNA or protein, these methods are almost never used in modern molecular genetics research - instead, RNA.

Figure 9: Polymerase chain reactionplugin-autotooltip__default plugin-autotooltip_bigPolymerase chain reaction (PCR): An experimental technique invented by Kary Mullis used to exponentially amplify DNA in vitro. PCR made obtaining large quantities of DNA for analysis much faster and easier than using traditional cloning methods. (PCRplugin-autotooltip__default plugin-autotooltip_bigPolymerase chain reaction (PCR): An experimental technique invented by Kary Mullis used to exponentially amplify DNA in vitro. PCR made obtaining large quantities of DNA for analysis much faster and easier than using traditional cloning methods.). See text for details. Source: National Human Genome Research Institute. Licensing: public domainplugin-autotooltip__default plugin-autotooltip_bigPublic domain: see Wikipedia entry on public domain..

Let's do some quick back of the envelope math to see if the numbers add up. Let's assume we start with a single molecule of dsDNAplugin-autotooltip__default plugin-autotooltip_bigDouble-stranded DNA (dsDNA): DNA that consists of two complementary strands of ssDNA paired together via hydrogen bonds between the nitrogenous bases G, A, T, and C. 1000 bp long that we want to sequenceplugin-autotooltip__default plugin-autotooltip_bigSequence: the precise order of monomers in a polymer. In DNA, it refers to the order of G, A, T, and C nucleotides. In RNA, it refers to the order of G, A, U, and C nucleotides. In proteins, it refers to the order of amino acids.. Our goal is to get 1 μg of this DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. to use in Sanger sequencingplugin-autotooltip__default plugin-autotooltip_bigSanger sequencing: an older DNA sequencing technology invented in 1977 by Fredrick Sanger.. How many cycles of PCRplugin-autotooltip__default plugin-autotooltip_bigPolymerase chain reaction (PCR): An experimental technique invented by Kary Mullis used to exponentially amplify DNA in vitro. PCR made obtaining large quantities of DNA for analysis much faster and easier than using traditional cloning methods. do we need to do to get this amount of DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth.?

The first thing we need to do is to figure out how many molecules is 1 μg of a 1000 bp fragment of dsDNAplugin-autotooltip__default plugin-autotooltip_bigDouble-stranded DNA (dsDNA): DNA that consists of two complementary strands of ssDNA paired together via hydrogen bonds between the nitrogenous bases G, A, T, and C.. From a quick Google search, we learn that the average molecular weight of a nucleotideplugin-autotooltip__default plugin-autotooltip_bigNucleotide: molecules that are polymerized to form nucleic acids (DNA or RNA). Includes dNTPs and NTPs. is approximately 330 Da (g/mol). Since DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. is double-standed, this means the the average molecular weight of a base pair is 660 g/mol, and the approximate molecular weight of a 1000 bp dsDNAplugin-autotooltip__default plugin-autotooltip_bigDouble-stranded DNA (dsDNA): DNA that consists of two complementary strands of ssDNA paired together via hydrogen bonds between the nitrogenous bases G, A, T, and C. fragment is therefore:

$$660 \times 1000=6.6 \times 10^5 \text{ g/mol}$$

1 μg of this DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. is equivalent to:

$$\frac{1 \times 10^{-6} \text{ g}}{6.6 \times 10^5 \text{ g/mol}} = 1.5 \times 10^{-12} \text{ mol}$$

Using Avogadro's number we can convert this to number of molecules:

$$1.5 \times 10^{-12} \text{ mol} \times 6 \times 10^{-23} \text{ molecules/mol} = 9 \times 10^{11} \text{ molecules}$$

Finally, we solve the exponential equation to find $n$, which is number of cycles:

$$ 2^n = 9 \times 10^{11}\\ n\log{2} = \log{9} + 11\\ n = \frac{\log{9}+11}{\log{2}} \approx 38 $$

Starting with a single molecule of dsDNAplugin-autotooltip__default plugin-autotooltip_bigDouble-stranded DNA (dsDNA): DNA that consists of two complementary strands of ssDNA paired together via hydrogen bonds between the nitrogenous bases G, A, T, and C., you would only need 38 cycles of PCRplugin-autotooltip__default plugin-autotooltip_bigPolymerase chain reaction (PCR): An experimental technique invented by Kary Mullis used to exponentially amplify DNA in vitro. PCR made obtaining large quantities of DNA for analysis much faster and easier than using traditional cloning methods. to obtain 1 μg of a 1 kb long DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth.. In reality, you rarely start with just a single molecule of dsDNAplugin-autotooltip__default plugin-autotooltip_bigDouble-stranded DNA (dsDNA): DNA that consists of two complementary strands of ssDNA paired together via hydrogen bonds between the nitrogenous bases G, A, T, and C., so 20-30 cycles of PCRplugin-autotooltip__default plugin-autotooltip_bigPolymerase chain reaction (PCR): An experimental technique invented by Kary Mullis used to exponentially amplify DNA in vitro. PCR made obtaining large quantities of DNA for analysis much faster and easier than using traditional cloning methods. is usually enough even when you have very little starting material. Each cycle of PCRplugin-autotooltip__default plugin-autotooltip_bigPolymerase chain reaction (PCR): An experimental technique invented by Kary Mullis used to exponentially amplify DNA in vitro. PCR made obtaining large quantities of DNA for analysis much faster and easier than using traditional cloning methods. takes approx. 1-2 minutes, so the entire procedure takes just 1-2 hours.

The exponential amplification of PCRplugin-autotooltip__default plugin-autotooltip_bigPolymerase chain reaction (PCR): An experimental technique invented by Kary Mullis used to exponentially amplify DNA in vitro. PCR made obtaining large quantities of DNA for analysis much faster and easier than using traditional cloning methods. makes it relatively easy and straightforward to generate large quantities of a piece of DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. for various types of analysis, even if you have a small amount of starting material. For example, nearly all of the forensic DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. analysis that is dramatized on police procedural shows like NCIS or Criminal Minds is done using PCRplugin-autotooltip__default plugin-autotooltip_bigPolymerase chain reaction (PCR): An experimental technique invented by Kary Mullis used to exponentially amplify DNA in vitro. PCR made obtaining large quantities of DNA for analysis much faster and easier than using traditional cloning methods. in real life crime labs. PCRplugin-autotooltip__default plugin-autotooltip_bigPolymerase chain reaction (PCR): An experimental technique invented by Kary Mullis used to exponentially amplify DNA in vitro. PCR made obtaining large quantities of DNA for analysis much faster and easier than using traditional cloning methods. can also be used for genotyping human diseases or for paternity testing.

NGS technologies

NGS vs Sanger sequencing

The Sanger sequencingplugin-autotooltip__default plugin-autotooltip_bigSanger sequencing: an older DNA sequencing technology invented in 1977 by Fredrick Sanger. method described above was used to complete the sequencingplugin-autotooltip__default plugin-autotooltip_bigSequencing: the procedure used to determine the sequence of a biological polymer such as DNA, RNA, or protein. Although there are indeed biochemical techniques that can be used to directly sequence RNA or protein, these methods are almost never used in modern molecular genetics research - instead, RNA of the genomesplugin-autotooltip__default plugin-autotooltip_bigGenome: a dataset that contains all DNA information of an organism. Most of the time, this also includes annotation and curation of that information, e.g., the names, locations, and functions of genes within the genome. As an adjective (“genomic”), this usually is used in the context of of many organisms we have talked about (or will talk about) in this class, including the bacteriumplugin-autotooltip__default plugin-autotooltip_bigBacteria: Single-celled organisms that also utilize DNA and the standard genetic code as all organisms on earth, but unlike eukaryotes do not have intracellular membranes and membrane-bound organelles. In this book we use bacteria and prokaryote interchangeably. Eschericha coliplugin-autotooltip__default plugin-autotooltip_bigEscherichia coli: an enteric bacterium used both as a model organism and as a utility organism in genetics research. E. coli is commonly used to host various cloning vectors, such as plasmids, cosmids, F factors, and bacterial artificiak chromosomes (BACs)., yeastplugin-autotooltip__default plugin-autotooltip_bigYeast: in this book, refers to Saccharomyces cerevisiae, a single-celled eukaryotic microbe used as a model genetic organism. See Chapter 02, Drosophilaplugin-autotooltip__default plugin-autotooltip_bigDrosophila melanogaster: a fruit fly species used in genetics research., C. elegansplugin-autotooltip__default plugin-autotooltip_bigCaenorhabditis elegans: a non-parasitic non-pathogenic nematode species used as a model organism in genetics researhc., and even humans. This technology is still used today for certain purposes. However, next generation sequencingplugin-autotooltip__default plugin-autotooltip_bigNext generation sequencing (NGS): DNA sequencing technologies that were developed after Sanger sequencing and are much more higher throughput than Sanger sequencing. Includes methods such as Illumina sequencing, 454 pyrosequencing, Nanopore sequencing, etc. (NGSplugin-autotooltip__default plugin-autotooltip_bigNext generation sequencing (NGS): DNA sequencing technologies that were developed after Sanger sequencing and are much more higher throughput than Sanger sequencing. Includes methods such as Illumina sequencing, 454 pyrosequencing, Nanopore sequencing, etc.) technologies developed over the last two decades or so has made DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. sequencingplugin-autotooltip__default plugin-autotooltip_bigSequencing: the procedure used to determine the sequence of a biological polymer such as DNA, RNA, or protein. Although there are indeed biochemical techniques that can be used to directly sequence RNA or protein, these methods are almost never used in modern molecular genetics research - instead, RNA significantly faster and cheaper. For instance, the original Human Genome Project, which started in 1990, took 13 years and thousands of researchers, and cost \$2.7 billion dollars to complete. With current NGSplugin-autotooltip__default plugin-autotooltip_bigNext generation sequencing (NGS): DNA sequencing technologies that were developed after Sanger sequencing and are much more higher throughput than Sanger sequencing. Includes methods such as Illumina sequencing, 454 pyrosequencing, Nanopore sequencing, etc. technologies, that cost has dropped to \$600 per genomeplugin-autotooltip__default plugin-autotooltip_bigGenome: a dataset that contains all DNA information of an organism. Most of the time, this also includes annotation and curation of that information, e.g., the names, locations, and functions of genes within the genome. As an adjective (“genomic”), this usually is used in the context of as of this writing in 2023 and uses a fully automated machine that one technician can easily operate. A complete human genomeplugin-autotooltip__default plugin-autotooltip_bigGenome: a dataset that contains all DNA information of an organism. Most of the time, this also includes annotation and curation of that information, e.g., the names, locations, and functions of genes within the genome. As an adjective (“genomic”), this usually is used in the context of can be sequenced now in just a few days.

Illumina sequencing

There are several different types of NGSplugin-autotooltip__default plugin-autotooltip_bigNext generation sequencing (NGS): DNA sequencing technologies that were developed after Sanger sequencing and are much more higher throughput than Sanger sequencing. Includes methods such as Illumina sequencing, 454 pyrosequencing, Nanopore sequencing, etc. technology. The most common type of NGSplugin-autotooltip__default plugin-autotooltip_bigNext generation sequencing (NGS): DNA sequencing technologies that were developed after Sanger sequencing and are much more higher throughput than Sanger sequencing. Includes methods such as Illumina sequencing, 454 pyrosequencing, Nanopore sequencing, etc. is called Illumina sequencingplugin-autotooltip__default plugin-autotooltip_bigIllumina sequencing: named after the company that developed this method, Illumina sequencing is currently the most widely used NGS technology for high throughput DNA sequencing. (Figs 10-11), and the sequencingplugin-autotooltip__default plugin-autotooltip_bigSequencing: the procedure used to determine the sequence of a biological polymer such as DNA, RNA, or protein. Although there are indeed biochemical techniques that can be used to directly sequence RNA or protein, these methods are almost never used in modern molecular genetics research - instead, RNA chemistry in this method is called sequencingplugin-autotooltip__default plugin-autotooltip_bigSequencing: the procedure used to determine the sequence of a biological polymer such as DNA, RNA, or protein. Although there are indeed biochemical techniques that can be used to directly sequence RNA or protein, these methods are almost never used in modern molecular genetics research - instead, RNA by synthesis. Despite the fancy name it is not conceptually different than Sanger sequencingplugin-autotooltip__default plugin-autotooltip_bigSanger sequencing: an older DNA sequencing technology invented in 1977 by Fredrick Sanger. in that it still depends on DNA polymeraseplugin-autotooltip__default plugin-autotooltip_bigDNA polymerase: an enzyme that is usually involved in DNA replication. This enzyme uses deoxyribonucleotides (dNTPs) and a primed template as a substrate to synthesize a new strand of ssDNA that pairs with its template to form dsDNA. There are many different kinds of DNA polymerase, but in this book we collectively refer to them as. The key difference is in how many templates are sequenced at once. Instead of sequencingplugin-autotooltip__default plugin-autotooltip_bigSequencing: the procedure used to determine the sequence of a biological polymer such as DNA, RNA, or protein. Although there are indeed biochemical techniques that can be used to directly sequence RNA or protein, these methods are almost never used in modern molecular genetics research - instead, RNA one fragment of DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. at a time, Illumina sequencingplugin-autotooltip__default plugin-autotooltip_bigIllumina sequencing: named after the company that developed this method, Illumina sequencing is currently the most widely used NGS technology for high throughput DNA sequencing. takes an entire genomeplugin-autotooltip__default plugin-autotooltip_bigGenome: a dataset that contains all DNA information of an organism. Most of the time, this also includes annotation and curation of that information, e.g., the names, locations, and functions of genes within the genome. As an adjective (“genomic”), this usually is used in the context of (for instance, from a cancer patient tumor biopsy sample) and fragments it into millions of small pieces, each around 300-600 bp long. These fragments are affixed to a flow cell - a device that is roughly the size of a microscope slide – and amplified in situ by a process called solid-phase bridge PCRplugin-autotooltip__default plugin-autotooltip_bigPolymerase chain reaction (PCR): An experimental technique invented by Kary Mullis used to exponentially amplify DNA in vitro. PCR made obtaining large quantities of DNA for analysis much faster and easier than using traditional cloning methods.. Each fragment, once affixed and amplified, has a unique position on the slide. In essence, you are forming a “colonyplugin-autotooltip__default plugin-autotooltip_bigClone: Depending on the context, this word can have a few different meanings:

* In the context of genes, cloning means that the physical identity of a gene has been found, and the gene has been sequenced. * In the context of DNA, a cloned DNA fragment is one that has been inserted into some kind of
” of DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. clonesplugin-autotooltip__default plugin-autotooltip_bigClone: Depending on the context, this word can have a few different meanings:

* In the context of genes, cloning means that the physical identity of a gene has been found, and the gene has been sequenced. * In the context of DNA, a cloned DNA fragment is one that has been inserted into some kind of
at different positions on the flow cell. This process is called cluster generation (Fig. 10).

The flow cell is then exposed to sequencingplugin-autotooltip__default plugin-autotooltip_bigSequencing: the procedure used to determine the sequence of a biological polymer such as DNA, RNA, or protein. Although there are indeed biochemical techniques that can be used to directly sequence RNA or protein, these methods are almost never used in modern molecular genetics research - instead, RNA reagents similar to Sanger sequencingplugin-autotooltip__default plugin-autotooltip_bigSanger sequencing: an older DNA sequencing technology invented in 1977 by Fredrick Sanger., except that instead of ddNTPplugin-autotooltip__default plugin-autotooltip_bigDideoxyribonucleotide triphosphate (ddNTP): A modified dNTP that functions as a chain terminator in Sanger sequencing. Includes ddGTP, ddATP, ddTTP, and ddCTP. chain terminators, fluorescently labeled dNTPsplugin-autotooltip__default plugin-autotooltip_bigDeoxyribonucleotide triphosphates (dNTPs): the collective name for four different molecules (dGTP, dATP, dTTP, dCTP) that are substrates for DNA polymerase and are used to form DNA. These are sometimes abbreivated as “deoxynucleotides” (which clearly refers to dNTPs) or just with a removable blocking group are used that pauses the elongation reaction each time a new nucleotideplugin-autotooltip__default plugin-autotooltip_bigNucleotide: molecules that are polymerized to form nucleic acids (DNA or RNA). Includes dNTPs and NTPs. is added (Fig. 11). Just like with Sanger sequencingplugin-autotooltip__default plugin-autotooltip_bigSanger sequencing: an older DNA sequencing technology invented in 1977 by Fredrick Sanger., each added nucleotideplugin-autotooltip__default plugin-autotooltip_bigNucleotide: molecules that are polymerized to form nucleic acids (DNA or RNA). Includes dNTPs and NTPs. is coupled to a fluorophoreplugin-autotooltip__default plugin-autotooltip_bigFluorophore: a molecule that emits a particular wavelength (i.e., color) of light after it has been stimulated with a lower wavelength (i.e., higher energy) of light. For instance, fluorescein is a fluorophore is stimulated by blue light and emits green light. that emits a different color. A camera takes a picture of the entire slide, and a computer keeps track of the fluorescent signals at each unique position within the flow cell. The blocking groups and fluorophoresplugin-autotooltip__default plugin-autotooltip_bigFluorophore: a molecule that emits a particular wavelength (i.e., color) of light after it has been stimulated with a lower wavelength (i.e., higher energy) of light. For instance, fluorescein is a fluorophore is stimulated by blue light and emits green light. are then chemically removed, the flow cell washed, and new sequencingplugin-autotooltip__default plugin-autotooltip_bigSequencing: the procedure used to determine the sequence of a biological polymer such as DNA, RNA, or protein. Although there are indeed biochemical techniques that can be used to directly sequence RNA or protein, these methods are almost never used in modern molecular genetics research - instead, RNA reagents are added to “sequenceplugin-autotooltip__default plugin-autotooltip_bigSequence: the precise order of monomers in a polymer. In DNA, it refers to the order of G, A, T, and C nucleotides. In RNA, it refers to the order of G, A, U, and C nucleotides. In proteins, it refers to the order of amino acids.” the next nucleotideplugin-autotooltip__default plugin-autotooltip_bigNucleotide: molecules that are polymerized to form nucleic acids (DNA or RNA). Includes dNTPs and NTPs.. Each individual template is sequenced very slowly compared to Sanger sequencingplugin-autotooltip__default plugin-autotooltip_bigSanger sequencing: an older DNA sequencing technology invented in 1977 by Fredrick Sanger.. For instance, the speed of DNA polymeraseplugin-autotooltip__default plugin-autotooltip_bigDNA polymerase: an enzyme that is usually involved in DNA replication. This enzyme uses deoxyribonucleotides (dNTPs) and a primed template as a substrate to synthesize a new strand of ssDNA that pairs with its template to form dsDNA. There are many different kinds of DNA polymerase, but in this book we collectively refer to them as is roughly 100 bp/sec - so that is how fast Sanger sequencingplugin-autotooltip__default plugin-autotooltip_bigSanger sequencing: an older DNA sequencing technology invented in 1977 by Fredrick Sanger. can go3). In Illumina sequencingplugin-autotooltip__default plugin-autotooltip_bigIllumina sequencing: named after the company that developed this method, Illumina sequencing is currently the most widely used NGS technology for high throughput DNA sequencing., adding one nucleotideplugin-autotooltip__default plugin-autotooltip_bigNucleotide: molecules that are polymerized to form nucleic acids (DNA or RNA). Includes dNTPs and NTPs., taking a picture, removing the fluorophoresplugin-autotooltip__default plugin-autotooltip_bigFluorophore: a molecule that emits a particular wavelength (i.e., color) of light after it has been stimulated with a lower wavelength (i.e., higher energy) of light. For instance, fluorescein is a fluorophore is stimulated by blue light and emits green light. and blocking agents, and re-starting the sequencingplugin-autotooltip__default plugin-autotooltip_bigSequencing: the procedure used to determine the sequence of a biological polymer such as DNA, RNA, or protein. Although there are indeed biochemical techniques that can be used to directly sequence RNA or protein, these methods are almost never used in modern molecular genetics research - instead, RNA reaction is slow - this step can take minutes. The trick is that Illumina sequencingplugin-autotooltip__default plugin-autotooltip_bigIllumina sequencing: named after the company that developed this method, Illumina sequencing is currently the most widely used NGS technology for high throughput DNA sequencing. is analyzing millions of fragments of DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. at the same time (it is analyzing multiple DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. fragments in parallel). Thus, on a strand-by-strand basis it is slow, but in terms of number of bases sequenced per unit time, it is orders of magnitude faster than Sanger sequencingplugin-autotooltip__default plugin-autotooltip_bigSanger sequencing: an older DNA sequencing technology invented in 1977 by Fredrick Sanger..

Figure 10: Cluster generation for Illumina sequencingplugin-autotooltip__default plugin-autotooltip_bigIllumina sequencing: named after the company that developed this method, Illumina sequencing is currently the most widely used NGS technology for high throughput DNA sequencing.. The diagram shows some of the details of cluster generation. Steps (1)-(3) are known as bridging PCRplugin-autotooltip__default plugin-autotooltip_bigPolymerase chain reaction (PCR): An experimental technique invented by Kary Mullis used to exponentially amplify DNA in vitro. PCR made obtaining large quantities of DNA for analysis much faster and easier than using traditional cloning methods.. As a student, the important concept is that each cluster contains identical copies (or clonesplugin-autotooltip__default plugin-autotooltip_bigClone: Depending on the context, this word can have a few different meanings:

* In the context of genes, cloning means that the physical identity of a gene has been found, and the gene has been sequenced. * In the context of DNA, a cloned DNA fragment is one that has been inserted into some kind of
) of ssDNAplugin-autotooltip__default plugin-autotooltip_bigSingle-stranded DNA (ssDNA): a polymerized chain of deoxyribonucleotides that is not paired with a complementary polymer. Usually formed by denaturing dsDNA with heat or other methods. that are affixed to a specific physical location on the surface of the flow cell. Credit: M. Chao.

A computer determines the DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. sequenceplugin-autotooltip__default plugin-autotooltip_bigSequence: the precise order of monomers in a polymer. In DNA, it refers to the order of G, A, T, and C nucleotides. In RNA, it refers to the order of G, A, U, and C nucleotides. In proteins, it refers to the order of amino acids. of each fragment by analyzing the fluorescent signals during each cycle of sequencingplugin-autotooltip__default plugin-autotooltip_bigSequencing: the procedure used to determine the sequence of a biological polymer such as DNA, RNA, or protein. Although there are indeed biochemical techniques that can be used to directly sequence RNA or protein, these methods are almost never used in modern molecular genetics research - instead, RNA and assembles the sequencesplugin-autotooltip__default plugin-autotooltip_bigSequence: the precise order of monomers in a polymer. In DNA, it refers to the order of G, A, T, and C nucleotides. In RNA, it refers to the order of G, A, U, and C nucleotides. In proteins, it refers to the order of amino acids. from individual fragments into a complete sequenceplugin-autotooltip__default plugin-autotooltip_bigSequence: the precise order of monomers in a polymer. In DNA, it refers to the order of G, A, T, and C nucleotides. In RNA, it refers to the order of G, A, U, and C nucleotides. In proteins, it refers to the order of amino acids.. The error rate in this process is relatively high compared to Sanger sequencingplugin-autotooltip__default plugin-autotooltip_bigSanger sequencing: an older DNA sequencing technology invented in 1977 by Fredrick Sanger. – on average, one error is called per every 1000 bases sequenced, but sequencingplugin-autotooltip__default plugin-autotooltip_bigSequencing: the procedure used to determine the sequence of a biological polymer such as DNA, RNA, or protein. Although there are indeed biochemical techniques that can be used to directly sequence RNA or protein, these methods are almost never used in modern molecular genetics research - instead, RNA depth (that is, the number of times a DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. sequenceplugin-autotooltip__default plugin-autotooltip_bigSequence: the precise order of monomers in a polymer. In DNA, it refers to the order of G, A, T, and C nucleotides. In RNA, it refers to the order of G, A, U, and C nucleotides. In proteins, it refers to the order of amino acids. exists in different fragments that are read) can overcome the error rate by reading the same sequenceplugin-autotooltip__default plugin-autotooltip_bigSequence: the precise order of monomers in a polymer. In DNA, it refers to the order of G, A, T, and C nucleotides. In RNA, it refers to the order of G, A, U, and C nucleotides. In proteins, it refers to the order of amino acids. multiple times on different DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. fragments.

Figure 11: Sequencingplugin-autotooltip__default plugin-autotooltip_bigSequencing: the procedure used to determine the sequence of a biological polymer such as DNA, RNA, or protein. Although there are indeed biochemical techniques that can be used to directly sequence RNA or protein, these methods are almost never used in modern molecular genetics research - instead, RNA by synthesis in Illumina sequencingplugin-autotooltip__default plugin-autotooltip_bigIllumina sequencing: named after the company that developed this method, Illumina sequencing is currently the most widely used NGS technology for high throughput DNA sequencing.. The diagram shows a simplified schematic. The left side of the diagram is a side view of the flow cell, whereas the right side shows a top down view. Each DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. cluster only shows a single ssDNAplugin-autotooltip__default plugin-autotooltip_bigSingle-stranded DNA (ssDNA): a polymerized chain of deoxyribonucleotides that is not paired with a complementary polymer. Usually formed by denaturing dsDNA with heat or other methods. template for simplicity, but in reality there can be many hundred copies in each cluster (see Fig. 10). Also, each ssDNAplugin-autotooltip__default plugin-autotooltip_bigSingle-stranded DNA (ssDNA): a polymerized chain of deoxyribonucleotides that is not paired with a complementary polymer. Usually formed by denaturing dsDNA with heat or other methods. template is usually 300-600 nt long but here a short ssDNAplugin-autotooltip__default plugin-autotooltip_bigSingle-stranded DNA (ssDNA): a polymerized chain of deoxyribonucleotides that is not paired with a complementary polymer. Usually formed by denaturing dsDNA with heat or other methods. is shown for illustrative purposes. Sequencingplugin-autotooltip__default plugin-autotooltip_bigSequencing: the procedure used to determine the sequence of a biological polymer such as DNA, RNA, or protein. Although there are indeed biochemical techniques that can be used to directly sequence RNA or protein, these methods are almost never used in modern molecular genetics research - instead, RNA originates off of a universal primerplugin-autotooltip__default plugin-autotooltip_bigPrimer: (1) A short (usually 15-30 nt long, depending on application) piece of ssDNA that is synthesized in vitro and used in applications such as DNA sequencing and PCR. In this context, primers can also be called oligonucleotides. (2) a free 3'-OH end on a nucleic acid that can be used for initiating based on the red sequenceplugin-autotooltip__default plugin-autotooltip_bigSequence: the precise order of monomers in a polymer. In DNA, it refers to the order of G, A, T, and C nucleotides. In RNA, it refers to the order of G, A, U, and C nucleotides. In proteins, it refers to the order of amino acids. in Fig. 7.8; in this diagram the sequencingplugin-autotooltip__default plugin-autotooltip_bigSequencing: the procedure used to determine the sequence of a biological polymer such as DNA, RNA, or protein. Although there are indeed biochemical techniques that can be used to directly sequence RNA or protein, these methods are almost never used in modern molecular genetics research - instead, RNA primerplugin-autotooltip__default plugin-autotooltip_bigPrimer: (1) A short (usually 15-30 nt long, depending on application) piece of ssDNA that is synthesized in vitro and used in applications such as DNA sequencing and PCR. In this context, primers can also be called oligonucleotides. (2) a free 3'-OH end on a nucleic acid that can be used for initiating is simplified as a 3 nucleotideplugin-autotooltip__default plugin-autotooltip_bigNucleotide: molecules that are polymerized to form nucleic acids (DNA or RNA). Includes dNTPs and NTPs. sequenceplugin-autotooltip__default plugin-autotooltip_bigSequence: the precise order of monomers in a polymer. In DNA, it refers to the order of G, A, T, and C nucleotides. In RNA, it refers to the order of G, A, U, and C nucleotides. In proteins, it refers to the order of amino acids. 5'-CGG-3'; in reality it will be much longer (~20-30 nt). Each type of nucleotideplugin-autotooltip__default plugin-autotooltip_bigNucleotide: molecules that are polymerized to form nucleic acids (DNA or RNA). Includes dNTPs and NTPs. (G, A, T, or C) contains a fluorophoreplugin-autotooltip__default plugin-autotooltip_bigFluorophore: a molecule that emits a particular wavelength (i.e., color) of light after it has been stimulated with a lower wavelength (i.e., higher energy) of light. For instance, fluorescein is a fluorophore is stimulated by blue light and emits green light. that emits a different color; it also contains a blocking group (black octagon) on the 3' carbon of the nucleotideplugin-autotooltip__default plugin-autotooltip_bigNucleotide: molecules that are polymerized to form nucleic acids (DNA or RNA). Includes dNTPs and NTPs.. When the blocking group is removed, it regenerates a hydroxylplugin-autotooltip__default plugin-autotooltip_bigHydroxyl: a functional group with the chemical formula -OH. group so that the next round of sequencingplugin-autotooltip__default plugin-autotooltip_bigSequencing: the procedure used to determine the sequence of a biological polymer such as DNA, RNA, or protein. Although there are indeed biochemical techniques that can be used to directly sequence RNA or protein, these methods are almost never used in modern molecular genetics research - instead, RNA can proceed. The diagram shows just two clusters for simplicity; in reality, each flow cell can contain millions of different clusters. A fluorescence microscope scans the flow cell after each sequencingplugin-autotooltip__default plugin-autotooltip_bigSequencing: the procedure used to determine the sequence of a biological polymer such as DNA, RNA, or protein. Although there are indeed biochemical techniques that can be used to directly sequence RNA or protein, these methods are almost never used in modern molecular genetics research - instead, RNA reaction cycle and a computer keeps track of the results. Credit: M. Chao.

Other NGS technologies and applications

Technologies such as Illumina sequencingplugin-autotooltip__default plugin-autotooltip_bigIllumina sequencing: named after the company that developed this method, Illumina sequencing is currently the most widely used NGS technology for high throughput DNA sequencing. are now the preferred method for most types of large-scale DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. sequencingplugin-autotooltip__default plugin-autotooltip_bigSequencing: the procedure used to determine the sequence of a biological polymer such as DNA, RNA, or protein. Although there are indeed biochemical techniques that can be used to directly sequence RNA or protein, these methods are almost never used in modern molecular genetics research - instead, RNA, and it has been adapted for related technologies such as RNA sequencingplugin-autotooltip__default plugin-autotooltip_bigRNA sequencing (RNAseq): an experimental technique that sequences all the RNAs in a sample. It is based off of converting RNAs into cDNAs with reverse transcriptase, followed by Illumina sequencing. (RNAseqplugin-autotooltip__default plugin-autotooltip_bigRNA sequencing (RNAseq): an experimental technique that sequences all the RNAs in a sample. It is based off of converting RNAs into cDNAs with reverse transcriptase, followed by Illumina sequencing.). In RNA sequencingplugin-autotooltip__default plugin-autotooltip_bigRNA sequencing (RNAseq): an experimental technique that sequences all the RNAs in a sample. It is based off of converting RNAs into cDNAs with reverse transcriptase, followed by Illumina sequencing., RNA is first converted to complementary DNAplugin-autotooltip__default plugin-autotooltip_bigComplementary DNA (cDNA): DNA that has been reverse transcribed from spliced mRNA using reverse transcriptase. cDNA sequences are useful because they lack intron sequences and therefore contain information about the ORF of intron-containing eukaryotic genes. (cDNAplugin-autotooltip__default plugin-autotooltip_bigComplementary DNA (cDNA): DNA that has been reverse transcribed from spliced mRNA using reverse transcriptase. cDNA sequences are useful because they lack intron sequences and therefore contain information about the ORF of intron-containing eukaryotic genes.) using an enzymeplugin-autotooltip__default plugin-autotooltip_bigEnzyme: a macromolecule, usually a protein (but sometimes an RNA), that functions as a catalyst of some kind of biochemical reaction. called reverse transcriptaseplugin-autotooltip__default plugin-autotooltip_bigReverse transcriptase: an enzyme that is usually isolated from retroviruses. It catalyzes the formation of cDNA using RNA (usually mRNA) as a template.; the cDNAplugin-autotooltip__default plugin-autotooltip_bigComplementary DNA (cDNA): DNA that has been reverse transcribed from spliced mRNA using reverse transcriptase. cDNA sequences are useful because they lack intron sequences and therefore contain information about the ORF of intron-containing eukaryotic genes. is then sequenced using standard Illumina sequencingplugin-autotooltip__default plugin-autotooltip_bigIllumina sequencing: named after the company that developed this method, Illumina sequencing is currently the most widely used NGS technology for high throughput DNA sequencing.. The number of reads of a particular RNA sequenceplugin-autotooltip__default plugin-autotooltip_bigSequence: the precise order of monomers in a polymer. In DNA, it refers to the order of G, A, T, and C nucleotides. In RNA, it refers to the order of G, A, U, and C nucleotides. In proteins, it refers to the order of amino acids. gives information as to how much a particular geneplugin-autotooltip__default plugin-autotooltip_bigGene: read Chapters 02, 03, 04, 05, and 06 for a definition of gene :-) is expressedplugin-autotooltip__default plugin-autotooltip_bigExpression: a term used to describe the idea that the function of a gene is apparent and can be observed. Genes may not always be expressed all the time in all places. via transcriptionplugin-autotooltip__default plugin-autotooltip_bigRNA transcription: the process of RNA polymerase using the DNA sequence of a gene as a template to form an mRNA (in prokaryotes) or pre-mRNA (in eukaryotes). In most cases, “transcription” implies RNA transcription.. Unlike DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. (which is present at 2 copies per cell), mRNAsplugin-autotooltip__default plugin-autotooltip_bigmessenger RNA (mRNA): an RNA molecule that codes for protein. can be present in hundreds of thousands of copies per cell; this means that Illumina sequencingplugin-autotooltip__default plugin-autotooltip_bigIllumina sequencing: named after the company that developed this method, Illumina sequencing is currently the most widely used NGS technology for high throughput DNA sequencing. is sensitive enough to sequenceplugin-autotooltip__default plugin-autotooltip_bigSequence: the precise order of monomers in a polymer. In DNA, it refers to the order of G, A, T, and C nucleotides. In RNA, it refers to the order of G, A, U, and C nucleotides. In proteins, it refers to the order of amino acids. mRNAplugin-autotooltip__default plugin-autotooltip_bigmessenger RNA (mRNA): an RNA molecule that codes for protein. from single cells (single cell RNAseqplugin-autotooltip__default plugin-autotooltip_bigRNA sequencing (RNAseq): an experimental technique that sequences all the RNAs in a sample. It is based off of converting RNAs into cDNAs with reverse transcriptase, followed by Illumina sequencing., or scRNAseq).

Illumina sequencingplugin-autotooltip__default plugin-autotooltip_bigIllumina sequencing: named after the company that developed this method, Illumina sequencing is currently the most widely used NGS technology for high throughput DNA sequencing. can also be adapted for other applications. For instance, proteinsplugin-autotooltip__default plugin-autotooltip_bigProtein: a molecule that is formed by the translation of messenger RNAs (mRNAs). Functions that proteins provide are what usually give organisms their phenotypes. interact with DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. in vivoplugin-autotooltip__default plugin-autotooltip_bigin vivo: occurring inside a living organism. to form a dynamic structure called chromatinplugin-autotooltip__default plugin-autotooltip_bigChromatin: the collective structure of all proteins and DNA in the nucleus of a eukaryotic cell. Nucleosomes are stacked in different configurations to form chromatin structures of different density. Chromatin structure is regulated by a variety of chromatin modifying enzymes.. Let's say you are interested in a DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth.-binding proteinplugin-autotooltip__default plugin-autotooltip_bigProtein: a molecule that is formed by the translation of messenger RNAs (mRNAs). Functions that proteins provide are what usually give organisms their phenotypes. called X. To find out what DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. sequencesplugin-autotooltip__default plugin-autotooltip_bigSequence: the precise order of monomers in a polymer. In DNA, it refers to the order of G, A, T, and C nucleotides. In RNA, it refers to the order of G, A, U, and C nucleotides. In proteins, it refers to the order of amino acids. X binds to, you can extract and purify chromatinplugin-autotooltip__default plugin-autotooltip_bigChromatin: the collective structure of all proteins and DNA in the nucleus of a eukaryotic cell. Nucleosomes are stacked in different configurations to form chromatin structures of different density. Chromatin structure is regulated by a variety of chromatin modifying enzymes. from cells, then use enzymesplugin-autotooltip__default plugin-autotooltip_bigEnzyme: a macromolecule, usually a protein (but sometimes an RNA), that functions as a catalyst of some kind of biochemical reaction. to gently cleave the DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. into small fragments under conditions in which X still binds to DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth.. You can then purify X using antibodies and use Illumina sequencingplugin-autotooltip__default plugin-autotooltip_bigIllumina sequencing: named after the company that developed this method, Illumina sequencing is currently the most widely used NGS technology for high throughput DNA sequencing. to sequenceplugin-autotooltip__default plugin-autotooltip_bigSequence: the precise order of monomers in a polymer. In DNA, it refers to the order of G, A, T, and C nucleotides. In RNA, it refers to the order of G, A, U, and C nucleotides. In proteins, it refers to the order of amino acids. the DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. fragments that co-purify with X. This procedure is called chromatinplugin-autotooltip__default plugin-autotooltip_bigChromatin: the collective structure of all proteins and DNA in the nucleus of a eukaryotic cell. Nucleosomes are stacked in different configurations to form chromatin structures of different density. Chromatin structure is regulated by a variety of chromatin modifying enzymes. immunoprecipitation sequencingplugin-autotooltip__default plugin-autotooltip_bigSequencing: the procedure used to determine the sequence of a biological polymer such as DNA, RNA, or protein. Although there are indeed biochemical techniques that can be used to directly sequence RNA or protein, these methods are almost never used in modern molecular genetics research - instead, RNA, or ChIPseq. There are many other similar applications too many to list and discuss here in detail.

For small scale DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. sequencingplugin-autotooltip__default plugin-autotooltip_bigSequencing: the procedure used to determine the sequence of a biological polymer such as DNA, RNA, or protein. Although there are indeed biochemical techniques that can be used to directly sequence RNA or protein, these methods are almost never used in modern molecular genetics research - instead, RNA, Sanger sequencingplugin-autotooltip__default plugin-autotooltip_bigSanger sequencing: an older DNA sequencing technology invented in 1977 by Fredrick Sanger. described above is still a commonly used method, although the cost for various NGSplugin-autotooltip__default plugin-autotooltip_bigNext generation sequencing (NGS): DNA sequencing technologies that were developed after Sanger sequencing and are much more higher throughput than Sanger sequencing. Includes methods such as Illumina sequencing, 454 pyrosequencing, Nanopore sequencing, etc. technologies have dropped so much that it is also starting to replace Sanger sequencingplugin-autotooltip__default plugin-autotooltip_bigSanger sequencing: an older DNA sequencing technology invented in 1977 by Fredrick Sanger. for small scale sequencingplugin-autotooltip__default plugin-autotooltip_bigSequencing: the procedure used to determine the sequence of a biological polymer such as DNA, RNA, or protein. Although there are indeed biochemical techniques that can be used to directly sequence RNA or protein, these methods are almost never used in modern molecular genetics research - instead, RNA experiments. For instance, Nanopore sequencingplugin-autotooltip__default plugin-autotooltip_bigSequencing: the procedure used to determine the sequence of a biological polymer such as DNA, RNA, or protein. Although there are indeed biochemical techniques that can be used to directly sequence RNA or protein, these methods are almost never used in modern molecular genetics research - instead, RNA (Fig. 12) does not use DNA polymeraseplugin-autotooltip__default plugin-autotooltip_bigDNA polymerase: an enzyme that is usually involved in DNA replication. This enzyme uses deoxyribonucleotides (dNTPs) and a primed template as a substrate to synthesize a new strand of ssDNA that pairs with its template to form dsDNA. There are many different kinds of DNA polymerase, but in this book we collectively refer to them as and can be used to replace Sanger sequencingplugin-autotooltip__default plugin-autotooltip_bigSanger sequencing: an older DNA sequencing technology invented in 1977 by Fredrick Sanger. in some routine applications. Nanopore sequencingplugin-autotooltip__default plugin-autotooltip_bigSequencing: the procedure used to determine the sequence of a biological polymer such as DNA, RNA, or protein. Although there are indeed biochemical techniques that can be used to directly sequence RNA or protein, these methods are almost never used in modern molecular genetics research - instead, RNA also has the advantage of being able to detect DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. that has been chemically modified, such as methylated bases. This is relevant for studying things like epigenetic inheritance, and it something that neither Sanger nor Illumina sequencingplugin-autotooltip__default plugin-autotooltip_bigIllumina sequencing: named after the company that developed this method, Illumina sequencing is currently the most widely used NGS technology for high throughput DNA sequencing. can easily do.

Figure 12: Examples of next generation sequencingplugin-autotooltip__default plugin-autotooltip_bigNext generation sequencing (NGS): DNA sequencing technologies that were developed after Sanger sequencing and are much more higher throughput than Sanger sequencing. Includes methods such as Illumina sequencing, 454 pyrosequencing, Nanopore sequencing, etc. (NGSplugin-autotooltip__default plugin-autotooltip_bigNext generation sequencing (NGS): DNA sequencing technologies that were developed after Sanger sequencing and are much more higher throughput than Sanger sequencing. Includes methods such as Illumina sequencing, 454 pyrosequencing, Nanopore sequencing, etc.) technologies. Source: Bharti, R., and Grimm, D.G. Briefings in Bioinformatics 22(1), http://dx.doi.org/10.1093/bib/bbz155. Licensing: CC BY 4.0.

Questions and exercises

Exercise 1. How much tissue do you need for DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. sequencingplugin-autotooltip__default plugin-autotooltip_bigSequencing: the procedure used to determine the sequence of a biological polymer such as DNA, RNA, or protein. Although there are indeed biochemical techniques that can be used to directly sequence RNA or protein, these methods are almost never used in modern molecular genetics research - instead, RNA? Each Sanger sequencingplugin-autotooltip__default plugin-autotooltip_bigSanger sequencing: an older DNA sequencing technology invented in 1977 by Fredrick Sanger. reaction uses about 1 μg DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth., and a typical Sanger reaction can give about 1 kbp of sequencingplugin-autotooltip__default plugin-autotooltip_bigSequencing: the procedure used to determine the sequence of a biological polymer such as DNA, RNA, or protein. Although there are indeed biochemical techniques that can be used to directly sequence RNA or protein, these methods are almost never used in modern molecular genetics research - instead, RNA information before the reaction runs out. Let's assume that a typical human geneplugin-autotooltip__default plugin-autotooltip_bigGene: read Chapters 02, 03, 04, 05, and 06 for a definition of gene :-) you want to study and sequenceplugin-autotooltip__default plugin-autotooltip_bigSequence: the precise order of monomers in a polymer. In DNA, it refers to the order of G, A, T, and C nucleotides. In RNA, it refers to the order of G, A, U, and C nucleotides. In proteins, it refers to the order of amino acids. is 1 kb long. Usually to obtain DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. from human samples, we draw blood. To get enough human DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. to sequenceplugin-autotooltip__default plugin-autotooltip_bigSequence: the precise order of monomers in a polymer. In DNA, it refers to the order of G, A, T, and C nucleotides. In RNA, it refers to the order of G, A, U, and C nucleotides. In proteins, it refers to the order of amino acids. this 1 kb geneplugin-autotooltip__default plugin-autotooltip_bigGene: read Chapters 02, 03, 04, 05, and 06 for a definition of gene :-), how much blood would you have to draw? You can assume the following: there are approximately 10,000 lymphocytes/mL of blood ; the average molecular weight of DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. is 330 Da/nucleotideplugin-autotooltip__default plugin-autotooltip_bigNucleotide: molecules that are polymerized to form nucleic acids (DNA or RNA). Includes dNTPs and NTPs. (or 660 Da/base pair; 1 Da = 1 g/mol; 1 mol = 6×1023 molecules); and the diploidplugin-autotooltip__default plugin-autotooltip_bigDiploid: a term that describes a cell or organism that has two copies of similar genetic information, usually obtaining one copy from a male parent and the other copy from a female parent. human genomeplugin-autotooltip__default plugin-autotooltip_bigGenome: a dataset that contains all DNA information of an organism. Most of the time, this also includes annotation and curation of that information, e.g., the names, locations, and functions of genes within the genome. As an adjective (“genomic”), this usually is used in the context of is approximately 6×109 bp of DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth.. We use lymphocytes (white blood cells) because red blood cells do not have nucleiplugin-autotooltip__default plugin-autotooltip_bigNucleus: in eukaryotes, the membrane-bound organelle in cells that contains the chromosomes. (and therefore do not have DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth.). Exact numbers are not super important when doing “back of the envelope” biology - what we care about here is the scale of the problem.

Exercise 2: Draw three cycles of PCRplugin-autotooltip__default plugin-autotooltip_bigPolymerase chain reaction (PCR): An experimental technique invented by Kary Mullis used to exponentially amplify DNA in vitro. PCR made obtaining large quantities of DNA for analysis much faster and easier than using traditional cloning methods. by hand using colored pens and paper. It is very important to show the 5' and 3' ends of all the DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. molecules, and to keep track of which strands are template strands and which strands are newly synthesized in each cycle. Do not just copy from Fig. 9 - try to draw it based on your understanding of the reading, and then compare your drawing to Fig. 9. What did you get right? What did you get wrong? Try again until you can get it fully correct.

1)
This kind of gel system is made of a material called polyacrylamide. The technique of “polyacrylamide gel electrophoresisplugin-autotooltip__default plugin-autotooltip_bigElectrophoresis: an experimental technique that is used to separate charged molecules (such as nucleic acids) by size by applying an electric field to cause the charged molecules to move through a sheet of a gel-like material.” is commonly abbreviated as PAGE.
2)
If you think about it, the idea behind PCRplugin-autotooltip__default plugin-autotooltip_bigPolymerase chain reaction (PCR): An experimental technique invented by Kary Mullis used to exponentially amplify DNA in vitro. PCR made obtaining large quantities of DNA for analysis much faster and easier than using traditional cloning methods. is actually quite simple. It is mimicking how DNA replicationplugin-autotooltip__default plugin-autotooltip_bigDNA replication: usually the process of starting with a dsDNA molecule and ending with two identical copies of that dsDNA molecule. In most cases, “replication” implies DNA replication. occurs in dividing cells. Dividing cells with unlimited resources also replicate exponentially.
3)
Actually, the slowest step in Sanger sequencingplugin-autotooltip__default plugin-autotooltip_bigSanger sequencing: an older DNA sequencing technology invented in 1977 by Fredrick Sanger. is the separating of synthesized DNAplugin-autotooltip__default plugin-autotooltip_bigDNA: deoxyribonucleic acid. The genetic material for nearly all life on Earth. fragments through a gel. The actual chemistry takes just 10 minutes, but preparing and running the gel can take several hours.
chapter_07.txt · Last modified: 2025/03/14 07:29 by mike