PHAR2811: Genome Organisation Synopsis: C-value paradox, different classes of DNA, repetitive DNA...

49
PHAR2811: Genome Organisation Synopsis: C-value paradox, different classes of DNA, repetitive DNA and disease. If protein-coding portions of the human genome make up only 1.5% what is the rest doing?

Transcript of PHAR2811: Genome Organisation Synopsis: C-value paradox, different classes of DNA, repetitive DNA...

Page 1: PHAR2811: Genome Organisation Synopsis: C-value paradox, different classes of DNA, repetitive DNA and disease. If protein-coding portions of the human.

PHAR2811:Genome Organisation

Synopsis: C-value paradox, different classes of DNA, repetitive DNA and disease. If protein-coding portions of

the human genome make up only 1.5% what is the rest doing?

Page 2: PHAR2811: Genome Organisation Synopsis: C-value paradox, different classes of DNA, repetitive DNA and disease. If protein-coding portions of the human.

The C-value paradox

• The C-value is the total number of DNA nucleotide residues in the genome (per haploid set of chromosomes).

• When you compare this to the complexity of the organism you find a massive disparity.

• Clearly the amount of DNA is not proportional to that required to produce all the proteins made by the organism.

Page 3: PHAR2811: Genome Organisation Synopsis: C-value paradox, different classes of DNA, repetitive DNA and disease. If protein-coding portions of the human.

What do these life forms have in common?

Page 4: PHAR2811: Genome Organisation Synopsis: C-value paradox, different classes of DNA, repetitive DNA and disease. If protein-coding portions of the human.

Remember our calculations

• The E. Coli genome has 4.6 million base pairs and codes for about 3,000 different proteins (proteins of ~40,000 and 500 bp for promoters)

• Using the same assumptions the human genome should code for 1 million proteins (3 billion base pairs (3*10^9), protein ~50,000 and promoters of 1500 bp)

• Humans only have ~30,000 coding “genes”

Page 5: PHAR2811: Genome Organisation Synopsis: C-value paradox, different classes of DNA, repetitive DNA and disease. If protein-coding portions of the human.

Is there too much DNA?

• Only about 1 - 2% of the DNA in the genome actually codes for proteins. What is the rest of it doing?

• Some clues come from re-annealing experiments.

• The time it takes for DNA to re-anneal depends on the complexity of the sequence

Page 6: PHAR2811: Genome Organisation Synopsis: C-value paradox, different classes of DNA, repetitive DNA and disease. If protein-coding portions of the human.

Cot plots

• You isolate the DNA from a species and cut it up to manageable lengths and then you melt it.

• Once it is fully single stranded you monitor, by absorbance at 260 nm, the re-annealing kinetics of the sample.

• The Cot½ is the time taken (*Co) for half the

DNA to be reannealed.

Page 7: PHAR2811: Genome Organisation Synopsis: C-value paradox, different classes of DNA, repetitive DNA and disease. If protein-coding portions of the human.

Cot plots

• You need to account for the starting concentration (Co).

• Obviously if you started with more of a sequence it would anneal quicker….more matches to find each other.

• By using the Cot value you can plot (on the same graph) different annealing experiments with different starting concentrations (10 – 2 000 ug/mL) from the same source and they will all lie of the graph.

Page 8: PHAR2811: Genome Organisation Synopsis: C-value paradox, different classes of DNA, repetitive DNA and disease. If protein-coding portions of the human.

Re-annealing kinetics

Co*Time (M*s)

A260

Single stranded DNA

Double stranded DNA

Cot½

Page 9: PHAR2811: Genome Organisation Synopsis: C-value paradox, different classes of DNA, repetitive DNA and disease. If protein-coding portions of the human.

Cot plots

• The rate of re-association, hence the time taken to renature is dependent on the complexity of the DNA sequence.

• The complexity is defined as the number of bases in each unique sequence e.g. poly (U)+poly (A) has a complexity of 1, the repeating sequence AGTGCn has a complexity of 5.

Page 10: PHAR2811: Genome Organisation Synopsis: C-value paradox, different classes of DNA, repetitive DNA and disease. If protein-coding portions of the human.

Different DNA samples

Fraction single

stranded

Cot

Increasing complexity

Page 11: PHAR2811: Genome Organisation Synopsis: C-value paradox, different classes of DNA, repetitive DNA and disease. If protein-coding portions of the human.

Cot plots

• The Cot1/2 for a given DNA depends on the complexity.

• Eukaryotic genomic DNA can be divided up into 4 classes: highly repetitive (hundreds to millions of copies), moderately repetitive (10s to hundreds of copies), slightly repetitive (1 – 10 copies) and single copy sequences.

• The last 2 are often combined

Page 12: PHAR2811: Genome Organisation Synopsis: C-value paradox, different classes of DNA, repetitive DNA and disease. If protein-coding portions of the human.

Cot plot for a mixed population:similar to the human genome

Highly repetitive

Moderatelyrepetitive

unique

Page 13: PHAR2811: Genome Organisation Synopsis: C-value paradox, different classes of DNA, repetitive DNA and disease. If protein-coding portions of the human.

Highly repetitive DNA

• Short sequences arranged in tandem repeats, sometimes thousands of times.

• Short Tandem Repeats (STRs) or satellite DNA

• Microsatellites 1 – 13 nucleotides

• Minisatellites 14 – 500 nucleotides

• Often found clustered around the centromere or the telomere.

Page 14: PHAR2811: Genome Organisation Synopsis: C-value paradox, different classes of DNA, repetitive DNA and disease. If protein-coding portions of the human.

Moderately repetitive DNA

• Segments of 100 to several thousand base pairs repeated

• Repeated groups of genes whose products are needed by cells in large quantities e.g. histones, ribosomal and transfer RNA (although these are sometimes classified in the highly repetitive group)

• Retrotransposons, DNA which has been transcribed in reverse back from RNA

Page 15: PHAR2811: Genome Organisation Synopsis: C-value paradox, different classes of DNA, repetitive DNA and disease. If protein-coding portions of the human.

Retrotransposons

• Around 40% of the human genome• LINES (long interspersed nuclear

elements) 6 – 8 kb segments that encode the proteins that enable the transposition

• SINES (short interspersed nuclear elements) 100 – 400 bp sections containing remnants of tRNA transcription machinery.

• LTR retrotransposons or long terminal repeats

Page 16: PHAR2811: Genome Organisation Synopsis: C-value paradox, different classes of DNA, repetitive DNA and disease. If protein-coding portions of the human.

Gene Families

• Most genes in the genome are only represented once.

• Some have a few copies on the genome.• One example is the globin family. This set

of genes contains a number of closely related sequences which vary by only a few changes in the code.

• Sometimes found clustered together on the one chromosome (but not always!)

Page 17: PHAR2811: Genome Organisation Synopsis: C-value paradox, different classes of DNA, repetitive DNA and disease. If protein-coding portions of the human.

Single copy genes

• Most of the genes of the organism are single copy genes

• But they only make up a small proportion of the total genome.

• They are the most complex group and hence take the longest to re-anneal.

Page 18: PHAR2811: Genome Organisation Synopsis: C-value paradox, different classes of DNA, repetitive DNA and disease. If protein-coding portions of the human.

What else helps explain the C-value paradox?

• So our genome contains highly repetitive DNA which doesn’t code for proteins.

• It also contains some multi-gene families and multiple copies of some genes.

Page 19: PHAR2811: Genome Organisation Synopsis: C-value paradox, different classes of DNA, repetitive DNA and disease. If protein-coding portions of the human.

What else helps explain the C-value paradox?

• This makes up 40% of the genome by Cot plot analysis.

• What of the other 60% unique sequences?? Remember only 1 – 2% of the genome is coding sequence.

• How do we account for this discrepancy??

Page 20: PHAR2811: Genome Organisation Synopsis: C-value paradox, different classes of DNA, repetitive DNA and disease. If protein-coding portions of the human.

What else helps explain the C-value paradox?

• Eukaryotic genes have introns which are removed after transcription by processing.

• These introns can comprise some 90% of the gene.

Page 21: PHAR2811: Genome Organisation Synopsis: C-value paradox, different classes of DNA, repetitive DNA and disease. If protein-coding portions of the human.

What else helps explain the C-value paradox?

• Pseudogenes; thought to be relics from ancient gene duplications or reverse transcription events.

• They do not produce functional proteins.

• They are sometimes considered as part of the retrotransposon group.

Page 22: PHAR2811: Genome Organisation Synopsis: C-value paradox, different classes of DNA, repetitive DNA and disease. If protein-coding portions of the human.

Repetitive DNA and disease.

• Trinucleotide repeats (TNR) are a specialised type of repeat sequence found in the genome.

• They arise from mutations during replication, repair or recombination.

• Both germline cells (sperm and ova, meiotic) and somatic (mitotic)

Page 23: PHAR2811: Genome Organisation Synopsis: C-value paradox, different classes of DNA, repetitive DNA and disease. If protein-coding portions of the human.

Static Mutations

• A mutation event in the germline which is stably transmitted to all somatic cells in the next generation.

• Each somatic cell has a copy of the mutation in their genome.

• Examples are sickle cell anemia, Cystic fibrosis, PKU

Page 24: PHAR2811: Genome Organisation Synopsis: C-value paradox, different classes of DNA, repetitive DNA and disease. If protein-coding portions of the human.

Dynamic Mutations

• Unlike static mutations dynamic mutations change; they continue to mutate between different tissues and across generations.

• The longer the tract length (i.e. number of repeats) the more likely the repeat is going to continue to mutate.

Page 25: PHAR2811: Genome Organisation Synopsis: C-value paradox, different classes of DNA, repetitive DNA and disease. If protein-coding portions of the human.

Dynamic Mutations

• This leads to increased severity with successive generations or in some diseases, the age of onset decreases.

• In other words with each generation the disorder becomes worse and/or you start to get the symptoms earlier. This leads to genetic anticipation.

Page 26: PHAR2811: Genome Organisation Synopsis: C-value paradox, different classes of DNA, repetitive DNA and disease. If protein-coding portions of the human.

Repetitive DNA and disease.

• There are an increasing number of genetic disorders: many of them are neurological disorders.

• They result from expansion of trinucleotide repeats (this means multiple copies of a 3 nucleotide sequence).

• The repeat is characteristically CNG

Page 27: PHAR2811: Genome Organisation Synopsis: C-value paradox, different classes of DNA, repetitive DNA and disease. If protein-coding portions of the human.

CNGCNGCNGCNG

GNCGNCGNCGNC

Single stranded loop formed by TNRS

Page 28: PHAR2811: Genome Organisation Synopsis: C-value paradox, different classes of DNA, repetitive DNA and disease. If protein-coding portions of the human.
Page 29: PHAR2811: Genome Organisation Synopsis: C-value paradox, different classes of DNA, repetitive DNA and disease. If protein-coding portions of the human.

Where do the expansions insert?

5’UTR 3’UTR

5’UTR 3’UTR

The gene

The mRNACoding, exon

intron

Page 30: PHAR2811: Genome Organisation Synopsis: C-value paradox, different classes of DNA, repetitive DNA and disease. If protein-coding portions of the human.

Fragile X

5’UTR 3’UTR

5’UTR 3’UTR

The silenced gene would have made the protein FMRP

CpGG expanded >200 copies methylation gene silencing

Page 31: PHAR2811: Genome Organisation Synopsis: C-value paradox, different classes of DNA, repetitive DNA and disease. If protein-coding portions of the human.

The result of this repeat mutation

• If the expansion occurs in a non-coding region it often leads to loss of function or gene silencing

• The insertion prevents the expression of the gene

• An example is Fragile X syndrome

Page 32: PHAR2811: Genome Organisation Synopsis: C-value paradox, different classes of DNA, repetitive DNA and disease. If protein-coding portions of the human.

Huntington’s Disease

5’UTR 3’UTR

5’UTR 3’UTR

CAG repeat, >28 copies

(CAG)n

QQQQ

A string of glutamines in the protein Huntingtin

Page 33: PHAR2811: Genome Organisation Synopsis: C-value paradox, different classes of DNA, repetitive DNA and disease. If protein-coding portions of the human.

PolyGlutamine or PolyQ

HN CH C

CH2

O

CH2

C

NH2

O

HN CH C

CH2

NH

O

CH2

C

NH2

O

CH C

CH2

O

CH2

C

NH2

O

Page 34: PHAR2811: Genome Organisation Synopsis: C-value paradox, different classes of DNA, repetitive DNA and disease. If protein-coding portions of the human.

The result of this repeat mutation

• If the expansion occurs in a coding region the result is usually gain of function for the protein

• The protein has an altered function as it now has a whole bunch of amino acids added to the code

• Example is Huntington’s disease

Page 35: PHAR2811: Genome Organisation Synopsis: C-value paradox, different classes of DNA, repetitive DNA and disease. If protein-coding portions of the human.

Fragile X syndrome (FRAXA)

• One of the most common inheritable forms of mental retardation

• The fragile X syndrome (FRAXA) results from multiple copies of the sequence CGG (the expansion) in the 5’ UTR of the fragile X syndrome gene, FMR1

• Results in gene silencing of the FMR1 gene product, FMRP.

Page 36: PHAR2811: Genome Organisation Synopsis: C-value paradox, different classes of DNA, repetitive DNA and disease. If protein-coding portions of the human.

FMRP: the protein product of the FMR

Page 37: PHAR2811: Genome Organisation Synopsis: C-value paradox, different classes of DNA, repetitive DNA and disease. If protein-coding portions of the human.

Fragile X syndrome

• The number of repeats is very important to the final severity of the disease.

• 5 – 50 copies has no effect,

• 50 – 200 results in an intermediate and distinct syndrome, fragile X tremor/ataxia (FXTAS)

• >200 copies gives rise to the full blown mutation.

Page 38: PHAR2811: Genome Organisation Synopsis: C-value paradox, different classes of DNA, repetitive DNA and disease. If protein-coding portions of the human.

RNA binding domain

Nuclear Localisation signal (NLS)

5’

3’

>200 nt (full mutation)

50 – 200 nt

<50 nt

(CGG)n

FMRPN C

Gene FMR1

Page 39: PHAR2811: Genome Organisation Synopsis: C-value paradox, different classes of DNA, repetitive DNA and disease. If protein-coding portions of the human.

FMRP: the protein product of the FMR1 gene

• It is located largely in the cytoplasm but does make excursions to the nucleus.

• It has 3 RNA binding domains; it associates with ribosomes and seems to be involved in translational regulation of a group of RNA targets.

Page 40: PHAR2811: Genome Organisation Synopsis: C-value paradox, different classes of DNA, repetitive DNA and disease. If protein-coding portions of the human.

FMRP: the protein product of the FMR1 gene

• The protein and its mRNA localise in dendritic spines.

• The protein binds to certain “target” RNA species and represses translation.

• These are thought to be mRNAs coding for proteins involved in neuronal development, synaptic transmission and cytoskeleton

Page 41: PHAR2811: Genome Organisation Synopsis: C-value paradox, different classes of DNA, repetitive DNA and disease. If protein-coding portions of the human.

Huntington’s disease

“George Huntington (1850-1916) described the condition while working as a newly qualified doctor in the rural general practice of his father and grandfather on Long Island, New York State. Together their observations covered 78 years. George Huntington did not continue working on Hereditary chorea but went into general practice in Ohio.” He never published another paper in his life - yet his name is remembered from the single one he did write.

Page 42: PHAR2811: Genome Organisation Synopsis: C-value paradox, different classes of DNA, repetitive DNA and disease. If protein-coding portions of the human.

Huntington’s disease

Page 43: PHAR2811: Genome Organisation Synopsis: C-value paradox, different classes of DNA, repetitive DNA and disease. If protein-coding portions of the human.

Huntington’s disease

• A second example is the expansion of the repeats in the coding region of a gene.

• This alters the function of the protein product, referred to as “gain of function”.

• There are 9 separate disorders are associated with the expansion of a CAG repeat in the coding region of various proteins.

Page 44: PHAR2811: Genome Organisation Synopsis: C-value paradox, different classes of DNA, repetitive DNA and disease. If protein-coding portions of the human.

Huntington’s disease

• CAG codes for glutamine and the expansion results in multiple copies of glutamine in the affected protein (polyglutamine disorders).

• This gain of function outcome accounts for the neurodegenerative symptoms of Huntington’s disorder.

Page 45: PHAR2811: Genome Organisation Synopsis: C-value paradox, different classes of DNA, repetitive DNA and disease. If protein-coding portions of the human.

How does the mutation account for the symptoms or the

pathogenesis?

• The result of the polyglutamine in the protein is a misfolded protein.

• The protein aggregates and is sequestered into inclusion bodies complete with the chaperones.

• This is thought to eventually overload the chaperone and ubiquitin systems.

Page 46: PHAR2811: Genome Organisation Synopsis: C-value paradox, different classes of DNA, repetitive DNA and disease. If protein-coding portions of the human.

How does the mutation account for the symptoms or the

pathogenesis?

• BUT the inclusion bodies may be a protective response

• the mutant protein may initiate a cascade of aberrant protein:protein interactions which affect many processes resulting in neuronal dysfunction and death (as always).

Page 47: PHAR2811: Genome Organisation Synopsis: C-value paradox, different classes of DNA, repetitive DNA and disease. If protein-coding portions of the human.

Huntington’s Disease

• The HD gene was discovered in 1993• There is now a direct genetic test to make

or confirm a diagnosis of HD in an individual who is exhibiting HD-like symptoms.

• Using a blood sample, the genetic test analyzes DNA for the HD mutation by counting the number of repeats in the HD gene region.

Page 48: PHAR2811: Genome Organisation Synopsis: C-value paradox, different classes of DNA, repetitive DNA and disease. If protein-coding portions of the human.

Huntington’s Disease

• Individuals who do not have HD usually have 28 or fewer CAG repeats.

• Individuals with HD usually have 40 or more repeats.

• A small percentage of individuals, however, have a number of repeats that fall within a borderline region.

Page 49: PHAR2811: Genome Organisation Synopsis: C-value paradox, different classes of DNA, repetitive DNA and disease. If protein-coding portions of the human.

Huntington’s Disease

No. of CAG repeats Outcome<28

Normal range; individual will not develop HD

29-34Individual will not develop HD but the next generation is at risk

35-39Some, but not all, individuals in this range will develop HD; next generation is also at risk

>40

Individual will develop HD