User Tools

Site Tools


chapter_06

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
chapter_06 [2024/08/30 18:46] – [Protein translation] mikechapter_06 [2025/03/22 07:55] (current) – [Thinking about DNA and genes at scale] mike
Line 1: Line 1:
-<typo fs:x-large>Chapter 06. The physical nature of the gene</typo>+<-chapter_05|Chapter 05^table_of_contents|Table of Contents^chapter_07|Chapter 07->
  
-In Chapters 01-06, we defined genes conceptually as units of function and inheritance. In this chapter we will start with a new way to define genes: a physical definition of the gene. Conceptually this is the simplest way to define a gene and it will give us an excuse to briefly review some of the molecular biology that you probably already know. Our focus is not on the details of molecular biology, but on the role of genes in terms of their informational content and on the role of DNA as an informational molecule. +<typo fs:x-large>Chapter 06. The physical nature of the %%gene%%</typo> 
 + 
 +In Chapters 01-05, we defined genes conceptually as units of function and inheritance. In this chapter we will start with a new way to define genes: a physical definition of the gene. Conceptually this is the simplest way to define a gene and it will give us an excuse to briefly review some of the molecular biology that you probably already know. Our focus is not on the details of molecular biology, but on the role of genes in terms of their informational content and on the role of DNA as an informational molecule. 
  
 ===== Genes are (usually) made of DNA ===== ===== Genes are (usually) made of DNA =====
Line 48: Line 50:
 {{ :transcription.jpg?400 |}} {{ :transcription.jpg?400 |}}
 <caption> <caption>
-RNA transcription. Unlike with DNA replication, only one of the two strands is used as the template for transcription. The DNA strand that is used as a template is called (not surprisingly) the template strand, and the other DNA strand is called the coding strand. Note that the newly transcribed RNA has the same sequence as the template strand, except that uracil (U) is substituted for thymidine (T) in the RNA. Credit: M. Chao. +RNA transcription. Unlike with DNA replication, only one of the two strands is used as the template for transcription. The DNA strand that is used as a template is called (not surprisingly) the template strand, and the other DNA strand is called the coding strand. Note that the newly transcribed RNA has the same sequence as the template strand, except that uracil (U) is substituted for thymine (T) in the RNA. Credit: M. Chao. 
 </caption> </caption>
 </figure> </figure>
Line 84: Line 86:
  
  
-The haploid human genome is approx. 3x10<sup>9</sup> bp in size. This means that the average human chromosome is $\frac{3\times10^9}{23}= 1.3 \times 10^8$ bp, or over one hundred million bp long – that is a very large molecule! Humans and most mammals have about 20,000 genes (which is surprisingly not that many more than Drosophila or //C. elegans//). This means that on average each human chromosome contains about 1000 genes. Genes are typically 10<sup>3</sup> - 10<sup>4</sup> base pairs in size, although they can be much larger. +The haploid human genome is approx. 3x10<sup>9</sup> bp in size. This means that the average human chromosome is $\frac{3\times10^9}{23}= 1.3 \times 10^8$ bp, or over one hundred million bp long – that is a very large molecule! Humans and most mammals have about 20,000 genes (which is surprisingly not that many more than Drosophila). This means that on average each human chromosome contains about 1000 genes. Genes are typically 10<sup>3</sup> - 10<sup>4</sup> base pairs in size, although they can be much larger. 
  
-For example, the human dystrophin gene is 2 x 10<sup>6</sup> base pairs. Mutations in dystrophin result in a debilitating disease called Duchenne muscular dystrophy (DMD), which affects humans but also dogs, especially Golden Retrievers. Interestingly, DMD is a sex-linked trait in both humans and dogs - the dystrophin gene is on the $X$ chromosome in both species, and males are affected more than females (Fig. {{ref>Fig6}}. Note that not all of the DNA sequence of a gene may be coding sequence; that is, not all of the DNA sequence of a gene may correspond to codons for amino acids. In fact, most DNA is not gene coding sequence. Most eukaryotic genes (and in rare cases prokaryotic genes) contain DNA sequences called introns; when these intron sequences are transcribed into RNA they are removed from the transcript by a process called splicing (see [[chapter_12|Chapter 12]] for more details). However, the majority of DNA is non-coding sequence in between genes called intergenic regions. +For example, the human dystrophin gene is 2 x 10<sup>6</sup> base pairs. Mutations in dystrophin result in a debilitating disease called Duchenne muscular dystrophy (DMD), which affects humans but also dogs, especially Golden Retrievers. Interestingly, DMD is a sex-linked trait in both humans and dogs - the dystrophin gene is on the $X$ chromosome in both species, and males are affected more than females (Fig. {{ref>Fig6}}). Note that not all of the DNA sequence of a gene may be coding sequence; that is, not all of the DNA sequence of a gene may correspond to codons for amino acids. In fact, most DNA is not gene coding sequence. Most eukaryotic genes (and in rare cases prokaryotic genes) contain DNA sequences called introns; when these intron sequences are transcribed into RNA they are removed from the transcript by a process called splicing (see [[chapter_12|Chapter 12]] for more details). However, the majority of DNA is non-coding sequence in between genes called intergenic regions. 
  
 <figure Fig6> <figure Fig6>
Line 105: Line 107:
  
  
-Mutations are an altered version of a gene that have an altered phenotype that is markedly different than most individuals in a population. Most of the time, the altered phenotype is caused by a change in protein function. If you are an evolutionary or population biologist, then you might use the term variation instead of mutation. +Mutations are an altered version of a gene that have an altered phenotype that is markedly different than most individuals in a population. Most of the time, the altered phenotype is caused by a change in protein function. If you are an evolutionary or population biologist, then you might use the term variation (or variant) instead of mutation. 
  
 To examine these ideas more closely, let's look at a mutation in the Drosophila $shibire$ gene, which we first saw in [[chapter_03|Chapter 03]] (Table {{ref>Tab1}}). A particular mutant allele of $shibire$ codes for an altered heat sensitive protein that is required for synaptic transmission (we can also call it a mutant protein). When flies that carry this mutation are warmed up, they become paralyzed.  To examine these ideas more closely, let's look at a mutation in the Drosophila $shibire$ gene, which we first saw in [[chapter_03|Chapter 03]] (Table {{ref>Tab1}}). A particular mutant allele of $shibire$ codes for an altered heat sensitive protein that is required for synaptic transmission (we can also call it a mutant protein). When flies that carry this mutation are warmed up, they become paralyzed. 
Line 125: Line 127:
 The physical definition of the gene is a very good one but there are many instances where we wish to study genes whose DNA sequences are not known. For example, we have isolated a new mutant fly that is also paralyzed, and we want to know whether this mutation is also in the $shibire$ gene. We can see from Chapters [[chapter_02|02]] and [[chapter_03|03]] that we can answer this question without knowledge of the DNA sequence either by a test for gene function (complementation test) or by a test of the chromosomal position of the mutation by recombination mapping. In practice, these other ways of defining genes by function or by position can sometimes be more useful than a definition based on the DNA sequence. The physical definition of the gene is a very good one but there are many instances where we wish to study genes whose DNA sequences are not known. For example, we have isolated a new mutant fly that is also paralyzed, and we want to know whether this mutation is also in the $shibire$ gene. We can see from Chapters [[chapter_02|02]] and [[chapter_03|03]] that we can answer this question without knowledge of the DNA sequence either by a test for gene function (complementation test) or by a test of the chromosomal position of the mutation by recombination mapping. In practice, these other ways of defining genes by function or by position can sometimes be more useful than a definition based on the DNA sequence.
  
-It's also important to note that we haven't actually defined a gene yet! We've defined the physical material of genes (DNA), the intermediate material between gene and gene product (RNA), and the gene product itself that in most cases provides function (protein). Thinking of a typical mammal: if we have approximately 20,000 genes, and the average size of a gene is approximately 10,000 bp, that accounts for 2x10<sup>8</sup> bp worth of DNA (this is just back of the envelope math and not meant to be accurate). But a typical mammalian genome (humans, for instance) is close to 3x10<sup>9</sup> bp of DNA. This rough calculation tells us that most of the genome must not contain genes((This rough calculation suggests that genes make up ~10% of a mammalian genome. In reality this number is closer to 2%.))! In other words, in a mammal most DNA is not part of a gene. In the next chapter we will see how we can analyze the genome to find genes within DNA sequences. +It's also important to note that we haven't actually defined a physical gene yet! We've defined the physical material of genes (DNA), the intermediate material between gene and gene product (RNA), and the gene product itself that in most cases provides function (protein). Thinking of a typical mammal: if we have approximately 20,000 genes, and the average size of a gene is approximately 10,000 bp, that accounts for 2x10<sup>8</sup> bp worth of DNA (this is just back of the envelope math and not meant to be accurate). But a typical mammalian genome (humans, for instance) is close to 3x10<sup>9</sup> bp of DNA. This rough calculation tells us that most of the genome must not contain genes((This rough calculation suggests that genes make up ~10% of a mammalian genome. In reality this number is closer to 2%.))! In other words, in a mammal most DNA is not part of a gene. In the next chapter we will see how we can analyze the genome to find genes within DNA sequences. 
  
 ===== Questions and exercises ===== ===== Questions and exercises =====
chapter_06.1725068778.txt.gz · Last modified: 2024/08/30 18:46 by mike