Differences

This shows you the differences between two versions of the page.

--- chapter_06 [2024/08/19 23:50] – [Questions and exercises] mike
+++ chapter_06 [2025/03/22 07:55] (current) – [Thinking about DNA and genes at scale] mike
@@ Line 1: / Line 1: @@
-<typo fs:x-large>Chapter 06. The physical nature of the gene</typo>
+<-chapter_05|Chapter 05^table_of_contents|Table of Contents^chapter_07|Chapter 07->
-In Chapters 01-06, we defined genes conceptually as units of function and inheritance. In this chapter we will start with a new way to define genes: a physical definition of the gene. Conceptually this is the simplest way to define a gene and it will give us an excuse to briefly review some of the molecular biology that you probably already know. Our focus is not on the details of molecular biology, but on the role of genes in terms of their informational content and on the role of DNA as an informational molecule.
+<typo fs:x-large>Chapter 06. The physical nature of the %%gene%%</typo>
+In Chapters 01-05, we defined genes conceptually as units of function and inheritance. In this chapter we will start with a new way to define genes: a physical definition of the gene. Conceptually this is the simplest way to define a gene and it will give us an excuse to briefly review some of the molecular biology that you probably already know. Our focus is not on the details of molecular biology, but on the role of genes in terms of their informational content and on the role of DNA as an informational molecule.
 ===== Genes are (usually) made of DNA =====
@@ Line 15: / Line 17: @@
 </figure>
-How was DNA as the physical genetic material discovered? Below is a quick summary of the history. In 1923, a British physician named Frederick Griffith discovered that the bacterium //Streptococcus pneunmoniae// could be converted, or transformed, from a non-disease-causing strain (R) to a virulent (i.e., disease-causing) strain (S). To achieve this, he mixed heat-killed S bacteria with live R bacteria and found that the R bacteria were now virulent. Not only did they become virulent, new bacteria formed from these transformed R bacteria were also virulent. In other words, they had acquired a new phenotype that could be genetically inherited in future generations. Griffith didn't know what was causing this transformation, but he named it the transforming principle.
+How was DNA as the physical genetic material discovered? Below is a quick summary of the history. In 1923, a British physician named [[wp>fredrick_griffith|Frederick Griffith]] discovered that the bacterium //Streptococcus pneunmoniae// could be converted, or transformed, from a non-disease-causing strain (R) to a virulent (i.e., disease-causing) strain (S). To achieve this, he mixed heat-killed S bacteria with live R bacteria and found that the R bacteria were now virulent. Not only did they become virulent, new bacteria formed from these transformed R bacteria were also virulent. In other words, they had acquired a new phenotype that could be genetically inherited in future generations. Griffith didn't know what was causing this transformation, but he named it the transforming principle.
-Fast forward to 1945, and three scientists named Avery, MacLeod, and McCarty set out to isolate the so-called transforming principle that Griffith discovered. They tested whether different types of chemical components purified from the heat-killed bacteria could function to transform R bacteria. They found that proteins, lipids, and carbohydrates did not transform R bacteria, but nucleic acids (and more specifically DNA) did. Initially, scientists had a hard time believing their results, because DNA was known to be chemically less complex than proteins. Most scientists at the time did not think the relatively simple DNA molecule could store all the instructions essential for life and many believed that proteins were more likely to be the genetic material. Between 1951-1952, Hershey and Chase devised experiments using live bacteriophage and radioactively labeled proteins and DNA to show that DNA and not proteins conferred heritability to the bacteriophage. All this work collectively convinced scientists that DNA was indeed the genetic material. Still, no one knew how DNA stored information.
+Fast forward to 1945, and three scientists named [[wp>Avery–MacLeod–McCarty_experiment|Avery, MacLeod, and McCarty]] set out to isolate the so-called transforming principle that Griffith discovered. They tested whether different types of chemical components purified from the heat-killed bacteria could function to transform R bacteria. They found that proteins, lipids, and carbohydrates did not transform R bacteria, but nucleic acids (and more specifically DNA) did. Initially, scientists had a hard time believing their results, because DNA was known to be chemically less complex than proteins. Most scientists at the time did not think the relatively simple DNA molecule could store all the instructions essential for life and many believed that proteins were more likely to be the genetic material. Between 1951-1952, [[wp>Hershey–Chase_experiment|Hershey and Chase]] devised experiments using live bacteriophage and radioactively labeled proteins and DNA to show that DNA and not proteins conferred heritability to the bacteriophage. All this work collectively convinced scientists that DNA was indeed the genetic material. Still, no one knew how DNA stored information.
 In rare cases of some viruses, genes are made of RNA – we do not discuss this further in this book.
@@ Line 24: / Line 26: @@
-In 1953, Watson and Crick, based on the X-ray crystallography data of Rosalind Franklin and their own modeling work, deduced that the structure of DNA is a double helix. The double helix consists of two molecules called single stranded DNA (ssDNA). Sometimes we abbreviate this and just call each ssDNA a strand. Each strand is composed of a sequence of polymerized subunits called nucleotides, of which there are four: guanine (G), adenine (A), thymine (T), and cytosine (C). Each strand is asymmetrical; one end (the end we usually think of as being the start) is called the 5' end (pronounced as "five prime end"), and the other end is called the 3' end (pronounced as "three prime end"). The two strands twist together in a helical shape and are held together by weak hydrogen bonds in an antiparallel orientation; that is, if one strand is in a 5' to 3' left-to-right orientation, the other strand pairs with it in a 3' to 5' left to right orientation. The two strands held together forms double-stranded DNA (dsDNA).
+In 1953, [[wp>Molecular_Structure_of_Nucleic_Acids:_A_Structure_for_Deoxyribose_Nucleic_Acid|Watson and Crick]], based on the X-ray crystallography data of [[wp>Rosalind_Franklin|Rosalind Franklin]] and their own modeling work, deduced that the structure of DNA is a double helix. The double helix consists of two molecules called single stranded DNA (ssDNA). Sometimes we abbreviate this and just call each ssDNA a strand. Each strand is composed of a sequence of polymerized subunits called nucleotides, of which there are four: guanine (G), adenine (A), thymine (T), and cytosine (C). Each strand is asymmetrical; one end (the end we usually think of as being the start) is called the 5' end (pronounced as "five prime end"), and the other end is called the 3' end (pronounced as "three prime end"). The two strands twist together in a helical shape and are held together by weak hydrogen bonds in an antiparallel orientation; that is, if one strand is in a 5' to 3' left-to-right orientation, the other strand pairs with it in a 3' to 5' left to right orientation. The two strands held together forms double-stranded DNA (dsDNA).
 It was not the helical structure itself but the discovery of base pairing of the four nucleotides in the chemical structure of the strands that revealed how information could be encoded in a molecule. Base pairing is the key feature of DNA that allows it to store and propagate information. G and C always pair with each other, and A and T always pair with each other.
@@ Line 48: / Line 50: @@
 {{ :transcription.jpg?400 |}}
 <caption>
-RNA transcription. Unlike with DNA replication, only one of the two strands is used as the template for transcription. The DNA strand that is used as a template is called (not surprisingly) the template strand, and the other DNA strand is called the coding strand. Note that the newly transcribed RNA has the same sequence as the template strand, except that uracil (U) is substituted for thymidine (T) in the RNA. Credit: M. Chao.
+RNA transcription. Unlike with DNA replication, only one of the two strands is used as the template for transcription. The DNA strand that is used as a template is called (not surprisingly) the template strand, and the other DNA strand is called the coding strand. Note that the newly transcribed RNA has the same sequence as the template strand, except that uracil (U) is substituted for thymine (T) in the RNA. Credit: M. Chao.
 </caption>
 </figure>
@@ Line 62: / Line 64: @@
 </figure>
-There are many different kinds of RNA molecules, but here we are primarily concerned with messenger RNA (mRNA), which code for proteins. In eukaryotes, mRNAs are transcribed in the nucleus((Technically, the protein-coding RNAs are first transcribed as pre-mRNAs, which much be processed before it is exported to the cytoplasm.)), but are transported into the cytoplasm for translation into proteins (Fig. {{ref>Fig5}}). A portion of the mRNA sequence called the coding sequence begins with the first RNA sequence AUG (the start codon) that appears starting from the 5' end. Each set of three consecutive bases that are read after the AUG start codon are called codons, each of which codes for a different amino acid (Fig. {{ref>Fig4}}). The amino acids are joined together in order via peptide bonds. This continues until a stop codon (UAA, UAG, or UGA) is reached. Translation (Fig. {{ref>Fig5}}) uses a large and complex enzyme called a ribosome that reads mRNA in a 5' to 3' direction.
+There are many different kinds of RNA molecules, but here we are primarily concerned with messenger RNA (mRNA), which code for proteins. In eukaryotes, mRNAs are transcribed in the nucleus((Technically, the protein-coding RNAs are first transcribed as pre-mRNAs, which much be processed before it is exported to the cytoplasm.)), but are transported into the cytoplasm for translation into proteins (Fig. {{ref>Fig5}}). A portion of the mRNA sequence called the coding sequence begins with the first RNA sequence AUG (the start codon) that appears starting from the 5' end. Each set of three consecutive bases that are read after the AUG start codon are called codons, each of which codes for a different amino acid (Fig. {{ref>Fig4}}). The amino acids are joined together in order via peptide bonds. This continues until a stop codon (UAA, UAG, or UGA) is reached. This process is called translation (Fig. {{ref>Fig5}}) and it uses a large and complex enzyme called a ribosome that reads mRNA in a 5' to 3' direction.
 <figure Fig5>
@@ Line 84: / Line 86: @@
-The haploid human genome is approx. 3x10<sup>9</sup> bp in size. This means that the average human chromosome is $\frac{3\times10^9}{23}= 1.3 \times 10^8$ bp, or over one hundred million bp long – that is a very large molecule! Humans and most mammals have about 20,000 genes (which is surprisingly not that many more than Drosophila or //C. elegans//). This means that on average each human chromosome contains about 1000 genes. Genes are typically 10<sup>3</sup> - 10<sup>4</sup> base pairs in size, although they can be much larger.
+The haploid human genome is approx. 3x10<sup>9</sup> bp in size. This means that the average human chromosome is $\frac{3\times10^9}{23}= 1.3 \times 10^8$ bp, or over one hundred million bp long – that is a very large molecule! Humans and most mammals have about 20,000 genes (which is surprisingly not that many more than Drosophila). This means that on average each human chromosome contains about 1000 genes. Genes are typically 10<sup>3</sup> - 10<sup>4</sup> base pairs in size, although they can be much larger.
-For example, the human dystrophin gene is 2 x 10<sup>6</sup> base pairs. Mutations in dystrophin result in a debilitating disease called Duchenne muscular dystrophy (DMD), which affects humans but also dogs, especially Golden Retrievers. Interestingly, DMD is a sex-linked trait in both humans and dogs - the dystrophin gene is on the $X$ chromosome in both species, and males are affected more than females (Fig. {{ref>Fig6}}. Note that not all of the DNA sequence of a gene may be coding sequence; that is, not all of the DNA sequence of a gene may correspond to codons for amino acids. In fact, most DNA is not gene coding sequence. Most eukaryotic genes (and in rare cases prokaryotic genes) contain DNA sequences called introns; when these intron sequences are transcribed into RNA they are removed from the transcript by a process called splicing (see [[chapter_12|Chapter 12]] for more details). However, the majority of DNA is non-coding sequence in between genes called intergenic regions.
+For example, the human dystrophin gene is 2 x 10<sup>6</sup> base pairs. Mutations in dystrophin result in a debilitating disease called Duchenne muscular dystrophy (DMD), which affects humans but also dogs, especially Golden Retrievers. Interestingly, DMD is a sex-linked trait in both humans and dogs - the dystrophin gene is on the $X$ chromosome in both species, and males are affected more than females (Fig. {{ref>Fig6}}). Note that not all of the DNA sequence of a gene may be coding sequence; that is, not all of the DNA sequence of a gene may correspond to codons for amino acids. In fact, most DNA is not gene coding sequence. Most eukaryotic genes (and in rare cases prokaryotic genes) contain DNA sequences called introns; when these intron sequences are transcribed into RNA they are removed from the transcript by a process called splicing (see [[chapter_12|Chapter 12]] for more details). However, the majority of DNA is non-coding sequence in between genes called intergenic regions.
 <figure Fig6>
@@ Line 105: / Line 107: @@
-Mutations are an altered version of a gene that have an altered phenotype that is markedly different than most individuals in a population. Most of the time, the altered phenotype is caused by a change in protein function. If you are an evolutionary or population biologist, then you might use the term variation instead of mutation.
+Mutations are an altered version of a gene that have an altered phenotype that is markedly different than most individuals in a population. Most of the time, the altered phenotype is caused by a change in protein function. If you are an evolutionary or population biologist, then you might use the term variation (or variant) instead of mutation.
-To examine these ideas more closely, let's look at a mutation in the Drosophila shibire gene, which we first saw in Chapter 3 (Table 6.1). A particular mutant allele of shibire codes for an altered heat sensitive protein that is required for synaptic transmission (we can also call it a mutant protein). When flies that carry this mutation are warmed up, they become paralyzed.
+To examine these ideas more closely, let's look at a mutation in the Drosophila $shibire$ gene, which we first saw in [[chapter_03|Chapter 03]] (Table {{ref>Tab1}}). A particular mutant allele of $shibire$ codes for an altered heat sensitive protein that is required for synaptic transmission (we can also call it a mutant protein). When flies that carry this mutation are warmed up, they become paralyzed.
+<table Tab1>
+<columns 100% *100%*>
+^  Genes are expressed to give…  ^  proteins, which function to control...  ^  cellular processes, which can be...  ^  observed in the organism  ^
+|  $shibire^+$ (wild type allele)  |  dynamin (fully functional protein)  |  synaptic signaling functions normally  |  normal function (not paralyzed)  |
+|  $shibire^-$ (mutant allele)  |  heat-sensitive dynamin  |  synaptic signaling defective when temperature raised  |  paralyzed fly when temperature raised  |
+</columns>
+<caption>
+The relationship between genotype, protein function, cellular function, and organism-level phenotype.
+</caption>
+</table>
-Genes are expressed to give…	proteins, which function to control... 	cellular processes, which can be...	observed in the organism
+This example illustrates two powerful aspects of genetic analysis. First, we can follow molecular changes in the DNA (which are not easily observable) such as the $shibire$ mutation as they are revealed by easily observable consequences of the mutation such as a paralyzed fly. This has a great deal of practical implication; even though technically it's possible to track different alleles by directly analyzing the DNA sequence, in practice this is much more difficult to do than simply looking at a phenotype. Second, we have a very precise way of studying the function of individual proteins by examining the consequences of altering just that one protein function in an otherwise normal organism.
-shibire+ (wild type allele)	dynamin (fully functional protein)	synaptic signaling functions normally	normal function
-shibire- (mutant allele)	heat-sensitive dynamin	synaptic signaling defective when temperature raised	paralyzed fly when temperature raised
-Table 6.1. The relationship between genotype, protein function, cellular function, and organism-level phenotype.
-This example illustrates two powerful aspects of genetic analysis. First, we can follow molecular changes in the DNA (which are not easily observable) such as the shibire mutation as they are revealed by easily observable consequences of the mutation such as a paralyzed fly. This has a great deal of practical implication; even though technically it's possible to track different alleles by directly analyzing the DNA sequence, in practice this is much more difficult to do than simply looking at a phenotype. Second, we have a very precise way of studying the function of individual proteins by examining the consequences of altering just that one protein function in an otherwise normal organism.
-The physical definition of the gene is a very good one but there are many instances where we wish to study genes whose DNA sequences are not known. For example, we have isolated a new mutant fly that is also paralyzed, and we want to know whether this mutation is also in the shibire gene. We can see from Chapter 2 and 3 that we can answer this question without knowledge of the DNA sequence either by a test for gene function (complementation test) or by a test of the chromosomal position of the mutation by recombinatorial mapping. In practice, these other ways of defining genes by function or by position can sometimes be more useful than a definition based on the DNA sequence.
+The physical definition of the gene is a very good one but there are many instances where we wish to study genes whose DNA sequences are not known. For example, we have isolated a new mutant fly that is also paralyzed, and we want to know whether this mutation is also in the $shibire$ gene. We can see from Chapters [[chapter_02|02]] and [[chapter_03|03]] that we can answer this question without knowledge of the DNA sequence either by a test for gene function (complementation test) or by a test of the chromosomal position of the mutation by recombination mapping. In practice, these other ways of defining genes by function or by position can sometimes be more useful than a definition based on the DNA sequence.
-It's also important to note that we haven't actually defined a gene yet! We've defined the physical material of genes (DNA), the intermediate material between gene and gene product (RNA), and the gene product itself that in most cases provides function (protein). Thinking of a typical mammal: if we have approximately 20,000 genes, and the average size of a gene is approximately 10,000 bp, that accounts for 2x108 bp worth of DNA (this is just back of the envelope math and not meant to be accurate). But a typical mammalian genome (humans, for instance) is close to 3x109 bp of DNA. This rough calculation tells us that most of the genome must not contain genes ! In other words, in a mammal most DNA is not part of a gene. In the next chapter we will see how we can analyze the genome to find genes within DNA sequences.
+It's also important to note that we haven't actually defined a physical gene yet! We've defined the physical material of genes (DNA), the intermediate material between gene and gene product (RNA), and the gene product itself that in most cases provides function (protein). Thinking of a typical mammal: if we have approximately 20,000 genes, and the average size of a gene is approximately 10,000 bp, that accounts for 2x10<sup>8</sup> bp worth of DNA (this is just back of the envelope math and not meant to be accurate). But a typical mammalian genome (humans, for instance) is close to 3x10<sup>9</sup> bp of DNA. This rough calculation tells us that most of the genome must not contain genes((This rough calculation suggests that genes make up ~10% of a mammalian genome. In reality this number is closer to 2%.))! In other words, in a mammal most DNA is not part of a gene. In the next chapter we will see how we can analyze the genome to find genes within DNA sequences.
 ===== Questions and exercises =====
-Exercise 6.1. We know that mutations in genes cause changes in observable phenotypes controlled by that gene. But why might some changes in the DNA sequence of a gene not alter the phenotype controlled by that gene? There are lots of possible correct explanations!
+Conceptual question: We know that mutations in genes cause changes in observable phenotypes controlled by that gene. But why might some changes in the DNA sequence of a gene not alter the phenotype controlled by that gene? There are lots of possible correct explanations!