Summary
The genome contains the hereditary information of the structure and function of a cell or organism. This information is stored as a sequence of bases in DNA. A relatively small percentage of DNA codes for proteins and ribonucleic acids (RNAs), while a large amount of the genome is composed of sequences without a clear function. The conversion of the information stored within DNA into a functional molecule, or RNA and proteins, is termed gene expression. Gene expression occurs in two stages: transcription and translation. During transcription, DNA is copied into RNA. RNA is then used to synthesize proteins during translation.
Key enzymes involved in transcription are DNA-dependent RNA polymerases. These enzymes synthesize the RNA molecule based on the genes encoded in DNA, which contain starting sites (promoters) where transcription begins. Transcription factors are required to recognize the promoter. RNA polymerase moves along the template strand of the double-stranded DNA. The strand is synthesized until the end of the DNA segment (termination site) is reached. In eukaryotes, the newly formed primary transcript is further modified to be, for example, available for protein synthesis.
Gene expression is strongly regulated at all levels. Some genes are expressed in all cells and are required as housekeeping genes for basic cellular functions (i.e., constitutive expression). Other genes are only active in certain cells; their expression is regulated by a variety of mechanisms. Genes can undergo activation or silencing, and transcription depends on the presence of specific DNA-binding proteins. The newly formed RNA may also be degraded after transcription by various mechanisms before use in protein synthesis. There are also regulatory mechanisms at a translational level. Although each cell in an organism contains the same DNA, the regulated expression of certain genes causes the cells to specialize and assume different functions, e.g., muscle cells or hepatocytes.
Overview
- Gene expression: conversion of genetic information stored in DNA into a functional gene product (RNA and proteins)
- Protein synthesis: process of gene expression (comprised of transcription and translation) as well as post-transcriptional modifications (see the article on translation and protein synthesis for more information)
-
Central dogma of molecular biology: genetic information always flows in one direction from DNA to RNA to the protein
- DNA → (transcription) → RNA → (translation) → protein
- Exception: retroviruses, which are able to produce DNA from RNA using their own enzyme reverse transcriptase (reverse transcription)
In protein synthesis, DNA is initially transcribed into mRNA (transcription) and mRNA is translated into an amino acid chain (translation).
Transcription
In transcription, DNA serves as a template to produce a complementary RNA molecule. Only a single-strand from the double-stranded DNA (dsDNA) is read.
-
DNA segments
- Sense strand: the DNA segment in the double-strand DNA that is complementary to the antisense strand and has an almost identical base sequence to the mRNA that is transcribed along the antisense strand ; The sense strand is not involved in the transcription process.
- Antisense strand: the DNA segment in the double-strand DNA that is used as a template for transcription to produce the complementary mRNA strand
-
Promoter
- Specific DNA sequence located upstream (= in the 5′ region) of a gene that regulates transcription
- Contains AT-rich sequences (e.g., TATA box and CAAT box)
- Binding site for RNA polymerase II and several other transcription factors at the start of transcription
- Mutations at the site of promoters usually lead to severely decreased transcription rate.
-
Exon-intron structure: eukaryotic genes are composed of alternating coding and noncoding regions
- Introns: contain only noncoding DNA sequences, but are essential in the regulation of gene expression
- Exons: contain protein-coding DNA sequences
- Substrates: the nucleoside triphosphates ATP, GTP, CTP, and UTP
- Enzymes: RNA polymerases
- General transcription factors: specific helper proteins that help RNA polymerase find and bind to the promoter and initiate RNA synthesis
“Introns are intervening introverts”: Introns are found between (lat. “inter”) protein-coding DNA sequences and stay in the nucleus.
“Exons are expressive extroverts”: Exons contain protein-coding DNA sequences that will be expressed and exit the nucleus.
RNA polymerases and transcription factors
RNA polymerases
Transcription reactions are catalyzed by (DNA-dependent) RNA polymerases. In eukaryotic cells, there are various types of RNA polymerase, which recognize different promoter types and transcribe different types of genes. In prokaryotes, on the other hand, there is only one type of RNA polymerase that transcribes all three types of RNA.
- Structure: composed of two large subunits with many polypeptide chains
-
Function: synthesis of a new RNA strand from 5′ to 3′ direction; reading of the DNA strand from 3′ to 5′ direction
- Unwinds DNA without help of another enzyme (intrinsic helicase activity)
- Initiates transcription (RNA polymerase II opens DNA in the promoter region)
- Has intrinsic proofreading function
Overview of RNA polymerases | |||
---|---|---|---|
Type of RNA polymerase | Transcripts | Location | |
RNA polymerase I (most common type) | |||
RNA polymerase II |
| ||
RNA polymerase III |
| ||
Mitochondrial RNA polymerase | Mitochondrion |
RNA polymerase II transcribes almost all genes that code for proteins.
The RNA polymerases are numbered in the order in which their products are utilized in the process of protein synthesis! I, II, and III → rRNA, mRNA, and tRNA, respectively.
In prokaryotes, there is only one type of RNA polymerase that transcribes all three types of RNA.
Transcription factors
RNA polymerases require helper proteins for promoter recognition of the genes to be transcribed.
- General transcription factors: enable binding of RNA polymerase to the proximal promoter regions by binding of chromosomal DNA to specific base sequences → start of transcription
-
Specific transcription factors
- Modulate transcription by binding to regulatory elements (enhancers, silencers)
- Example: steroid hormone receptors
DNA-binding proteins
Proteins, such as transcription factors that bind to DNA, require specific protein domains, also termed structural motifs. These structural motifs usually use either an α-helix or a β sheet to bind to the major groove of DNA. Transcription factors have DNA-binding domains through which they are able to interact with specific DNA segments to perform their function. Numerous structural motifs of DNA-binding domains have been identified. Important examples are the zinc finger domains, leucine zippers, basic helix-loop-helix, and the homeobox.
- Zinc finger
-
Leucine zipper
- Characteristics
- Two long α-helices that bind to one another through their hydrophobic regions and form a supercoil
- Because every seventh amino acid residue is leucine and the residues intertwine like a zipper, this structural motif is termed leucine zipper.
- DNA binding: The DNA-binding hydrophilic regions of α-helices contain many basic residues that interact with the major groove of DNA.
- Characteristics
-
Basic helix-loop-helix
- Characteristics
- Two polypeptide chains comprising a short and a long α-helix connected by a flexible loop (does not have a secondary structure).
- The two polypeptide chains dimerize via the basic regions of the two α-helices.
- DNA binding: The short basic α-helix interacts with DNA.
- Characteristics
- Homeobox (with helix-turn-helix)
An important structural motif of DNA-binding proteins is an α-helix with many basic amino acid residues.
Stages of transcription
Transcription is divided into three phases: initiation, elongation, termination.
-
Initiation (transcription): the start of transcription by the formation of the initiation complex and unwinding of DNA
- Preinitiation complex (RNA polymerase-promoter closed complex) formation by binding of general transcription factors and RNA polymerase to the promoter region (e.g., TATA box, CAAT box, GC box)
- Formation of a transcription bubble by unwinding the DNA double helix to a single strand with a length of 10–12 bases (open complex)
- Start of RNA synthesis
-
Elongation
- Extension of the RNA strand
- 3′OH group of the growing RNA strand is attached to the α-phosphate group of the next complementary nucleoside triphosphate
- Termination: During termination, polyadenylation starts.
During transcription, base pairing occurs between DNA and RNA. Uracil (instead of thymine) in RNA pairs with adenine in DNA.
RNA and DNA pair in an antiparallel direction. The 5′ end of one strand is the 3′ end of the other strand and vice versa. In both cases, the base sequences are written in the usual 5′ → 3′ direction.
Post-transcriptional modification (RNA processing)
In eukaryotes, the end-product of transcription is heterogeneous nuclear RNA (hnRNA), which is then transformed into mature mRNA through posttranscriptional modifications in the nucleus. These modifications include capping, polyadenylation, splicing, and RNA editing. mRNA then leaves the nucleus and enters the cytosol.
Capping
- Definition: addition of a cap of 7-methylguanosine to the 5′ end of hnRNA to form the five-prime cap
-
Process
- Cleavage of the 5′-phosphate group by RNA triphosphatase
- Addition of a GMP residue (formed from GTP with cleavage of pyrophosphate) to the 5′ diphosphate end of hnRNA by guanylyltransferase
- Methylation of one, two, or three ribosome residues of hnRNA with S-adenosylmethionine (SAM) as a methyl group donor
-
Function
- Protects against degradation (through exonucleases )
- Initiation of translation
Polyadenylation
- Definition: addition of a tail of ∼200 adenosine monophosphates (polyadenylate, A) to the 3′ end of hnRNA
-
Process
- Polyadenylation signal on hnRNA: AAUAAA
-
Poly(A) polymerase
- Binds to the cleavage site and adds an ATP-dependent adenosine monophosphate of ∼ 50–250 nucleotides
- Does not need a template for polyadenylation
-
Function
- ↑ Stability (protects against early degradation)
- Initiates translation
Splicing
Overview
- Definition: excision of introns from hnRNA transcripts and direct linkage of exons
- Function: excision of introns so that the resulting mature mRNA only contains relevant information in the form of exons
Process
-
Spliceosome formation at the exon-intron border
- Complex of:
-
Various snRNAs that are bound to proteins and form snRNPs (small nuclear ribonucleoproteins)
- Pronounced “snurps”
- Antibodies against snRNPs can be found in SLE (anti-Smith antibodies) and mixed connective tissue disease (Anti-U1 RNP antibodies)
- The hnRNA to be modified
- Many other small proteins
-
Various snRNAs that are bound to proteins and form snRNPs (small nuclear ribonucleoproteins)
- Involved sequence segments on the hnRNA:
-
Exon-intron borders : characterized by specific base sequences (consensus sequences) on the RNA
- 5′ splice site
- 3′ splice site
- Branch point: adenine nucleotide located in the intron, on which a lariat structure is formed (see below)
- Pyrimidine-rich sequence in front of the 3′ splice site
-
Exon-intron borders : characterized by specific base sequences (consensus sequences) on the RNA
- Mutations in the intronic splice site of the β-globin locus result in improper splicing, which leads to expression of abnormal β-globin in beta-thalassemia.
- Defective snRNP assembly can lead to congenital conditions such as spinal muscular atrophy, in which assembly is impaired due to decreased SMN protein.
- Complex of:
- Opening of the exon-intron border at the 5′ splice site: A temporary lariat structure with a 2′ → 5′ phosphodiester bond is formed, which links the two ends to be joined together in proximity (loop formation)
- Opening of the exon-intron border at the 3′ splice site
- Joining of the exon ends
The exons of a gene are the coding segments; the introns are removed from hnRNA by splicing.
RNA editing
- Definition: alteration of RNA base sequences by the insertion, deletion, or modification of individual bases (independent of splicing)
- Function: possibility of producing various proteins
-
Examples
- A-to-I editing: adenosine is deaminated to inosine, i.e., the base adenine is converted to hypoxanthine
-
C-to-U editing: Cytidine is deaminated to uridine, i.e., the base cytosine is converted to uracil
- Occurs in mRNA
- Typical example of C-to-U editing
- The mRNA for apolipoprotein B (apoB) codes for apoB-100.
- After editing, the mRNA for apoB codes for a markedly smaller protein, apoB-48, because the deamination of cytidine to uridine generates a stop codon through cytidine deaminase.
- Via C-to-U-editing, e.g., apoB-48 is formed by enterocytes compared to apoB-100 by hepatocytes.
Alternative splicing
- Definition: removal of introns within hnRNA with differential joining of exons
- Process: similar to splicing with additional splicing factors that determine the range of splice locations
- Function
-
Examples
- Different types of tropomyosin (muscle)
- Dopamine receptors (brain)
- Immunoglobulins (secreted versus membraneous)
The one gene-one enzyme hypothesis does not apply to eukaryotes. A variety of proteins can be formed from one gene by alternative splicing.
Quality control of mRNA
-
Location: cytoplasmic processing bodies (P-bodies)
- Contain exonucleases, decapping enzymes, and microRNAs
- Function
- Degradation of mRNA
- Storage for future translation
Regulation of transcription
Because transcription and protein synthesis require large amounts of energy, gene expression is strongly regulated. While some genes are continuously transcribed, other genes undergo regulation.
Prokaryotic gene regulation (operon model)
Regulation of gene expression was initially analyzed in E. coli. Regulatory sequences in the bacterial genome ensure gene expression of the enzyme β-galactosidase if the sugar lactose is available as an energy source. Other proteins are also synthesized, which are associated with lactose metabolism. Therefore, it involves the coordinated expression of several genes.
-
Definition: a model for describing the gene regulatory mechanism in prokaryotes
- An operon is a transcriptional unit of DNA found in prokaryotes and is composed of regulatory elements and several genes that code for a protein.
- A polycistronic mRNA is formed.
- Function: adapt to changing environmental conditions by simultaneously increasing the expression of certain related genes
-
Example: lac operon
- Description: A transcriptional unit of genes for enzymes involved in lactose metabolism that is only expressed in the presence of lactose (e.g. β-galactosidase). The lac operon represents a classic example of how the environment creates a genetic response.
- Components (in their order in the genome)
- Regulatory gene lacI: does not directly belong to the lac operon but codes for a repressor protein that binds to the lac operator in the absence of lactose and prevents transcription
- Promoter: binding site for catabolite activator protein (CAP) and RNA polymerase in transcription
- Operator: binding site of the repressor that overlaps with the promoter
- lacZ: β-galactosidase gene
- lacY: permease gene
- lacA: transacetylase gene
- Regulation
- Presence of glucose and absence of lactose → transcription cannot take place → the lac repressor binds to the operator → polymerase cannot bind promoter → very few β-galactosidase molecules in the cell
- Absence of glucose and presence of lactose → ↑ transcription
- Presence of glucose and lactose: very low basal expression of lac genes
In the lac operon, the repressor binds to the operator and prevents transcription of the operon gene in the absence of lactose.
Eukaryotic gene regulation
Regulation of gene expression is significantly complicated in eukaryotes compared to prokaryotes. One reason is due to the difference in size between the genomes of eukaryotes and prokaryotes, with eukaryotes having a significantly larger genome. Another reason is that the DNA in the eukaryotic genome in the nucleus is strongly condensed and packaged as chromatin. As a result, it is less accessible than prokaryotic DNA. However, a common feature of eukaryotes and prokaryotes is the importance of activators and repressors, which bind specific DNA sequences and increase or inhibit gene expression.
-
Distal regulatory elements: DNA sequences that can affect the transcription rate of a gene and can be located before, within, or after an intron of the gene they regulate
-
Enhancers
- Short DNA sequences ∼ 20 bp in length
- Mainly a palindrome or a tandem repeat
-
When specific transcription factors (activators) bind to enhancers, the transcription rate of a gene on the same chromosome increases.
- These transcription factors may be ligand-dependent or ligand-independent.
- Ligand-dependent transcription factors: Intracellular hormone receptors that interact with enhancer sequences after hormone binding in the nucleus and increase the transcription rate of the genes to be controlled.
- Examples of an enhancer: hypoxia-response element (HRE)
- The transcription factor hypoxia-inducible factor (HIF) binds to the HRE sequence during hypoxia and induces certain target genes that are important in the response to hypoxia, e.g., expression of EPO and VEGF.
- In normoxia (sufficient amount of oxygen), HIF is hydroxylated by HIF prolyl hydroxylase. Hydroxy-HIF is ubiquitinylated and degraded in the proteasome and is unable to increase the expression of its own target genes.
-
Silencer
- Specific DNA sequence
- When specific transcription factors (repressors) bind to silencers, the transcription rate of a gene on the same chromosome decreases.
-
Enhancers
Transcriptional inhibitors
Transcriptional inhibitors are strong cytotoxins but can also be partially used as an antibiotic.
Inhibitor | Mechanism | Occurrence/use |
---|---|---|
α-amanitin |
|
|
Rifampicin |
|
|
Actinomycin D (dactinomycin) |
|
|